2018 IEEE 59th Annual Symposium on Foundations of Computer Science

A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a few Random Spanning Trees

Rasmus Kyng Zhao Song School of Engineering and Applied Sciences School of Engineering and Applied Sciences Harvard University Harvard University Cambridge, MA, USA Cambridge, MA, USA [email protected] [email protected]

Abstract— usually being measured by spectral norm. Modern quantita- Strongly Rayleigh distributions are a class of negatively tive bounds of the form often used in theoretical computer dependent distributions of binary-valued random variables science were derived by for example by Rudelson [Rud99], [Borcea, Brändén, Liggett JAMS 09]. Recently, these distribu- tions have played a crucial role in the analysis of algorithms for while Ahlswede and Winter [AW02] established a useful fundamental graph problems, e.g. Traveling Salesman Problem matrix-version of the Laplace transform that plays a central [Gharan, Saberi, Singh FOCS 11]. We prove a new matrix role in scalar concentration results such as those of Bern- Chernoff bound for Strongly Rayleigh distributions. stein. [AW02] combined this with the Golden-Thompson As an immediate application, we show that adding together − trace inequality to prove matrix concentration results. Tropp the Laplacians of 2 log2 n random spanning trees gives an (1 ± ) spectral sparsifiers of graph Laplacians with high refined this approach, and by replacing the use of Golden- probability. Thus, we positively answer an open question posted Thompson with deep a theorem on concavity of certain trace in [Baston, Spielman, Srivastava, Teng JACM 13]. Our number functions due to Lieb, Tropp was able to recover strong of spanning trees for spectral sparsifier matches the number versions of a wide range of scalar concentration results, of spanning trees required to obtain a cut sparsifier in [Fung, including matrix Chernoff bounds, Azuma and Freedman’s Hariharan, Harvey, Panigraphi STOC 11]. The previous best result was by naively applying a classical matrix Chernoff inequalities for matrix martingales [Tro12]. − bound which requires 2n log n spanning trees. For the tree Matrix concentration results have had an enormous range averaging procedure to agree with the original graph Laplacian of applications in computer science, and are ubiquitous in expectation, each edge of the tree should be reweighted by throughout spectral graph theory [ST04], [SS11], [CKP+17], the inverse of the edge leverage score in the original graph. sketching [Coh16], approximation algorithms [HSSS16], We also show that when using this reweighting of the edges, + the Laplacian of single random tree is bounded above in the and deep learning [ZSJ 17], [ZSD17]. Most applications are PSD order by the original graph Laplacian times a factor log n based on results for independent random matrices, but more with high probability, i.e. LT O(log n)LG. flexible bounds, such as Tropp’s Matrix Freedman Inquality We show a lower bound that almost matches our last result, [Tro11a], have been used to greatly simplify algorithms, namely that in some graphs, with high probability, the random e.g. for solving Laplacian linear equations [KS16] and spanning tree is not bounded above in the spectral order by log n for semi-streaming graph sparsification [AG09], [KPPS17]. log log n times the original graph Laplacian. − We also show a lower bound that in 2 log n spanning trees Matrix concentration results are also closely related to other are necessary to get a (1 ± ) spectral sparsifier. popular tools sampling tools, such as Karger’s techniques Keywords-Matrix Chernoff bounds; random spanning trees; for generating sparse graphs that approximately preserve the Strongly Rayleigh; Spectral sparsifier; cuts of denser graphs [BK96]. Negative dependence of random variables is an appealing I. INTRODUCTION property that intuition suggests should help with concen- The idea of concentration of sums of random variables tration of measure. Notions of negative dependence can be dates back to Central Limit Theorems, and hence de Moivre formalized in many ways. Roughly speaking, these notions and Laplace [Tij], while modern concentration bounds for characterize distributions where where some event occurring sums of random variables were perhaps first established by ensures that other events of interest become less likely. A Bernstein [Ber24], and a popular variant now known as simple example is the distribution of a sequence of coin flips, Chernoff bounds was introduced by Rubin and published conditioned on the total number of heads in the outcome. by Chernoff [Che52]. In this distribution, conditioning on some coin coming out Concentration of measure for matrix-valued random vari- heads makes all other coins less likely to come out heads. ables is the phenomenon that many matrix valued distribu- Unfortunately, negative dependence phenomena are not as tions are to close their mean with high probability, closeness robust as positive association which can be established from

2575-8454/18/$31.00 ©2018 IEEE 373 DOI 10.1109/FOCS.2018.00043 local conditions using the powerful FKG theorem [FKG71]. The techniques of Fung et al. unfortunately do not extend Strongly Rayleigh distributions were introduced recently to proving spectral sparsifiers. by Borcea, Brändén, and Liggett [BBL09] as a class of Spectral graph sparsifiers were introduced by Spielman negatively dependent distributions of binary-valued random and Teng [ST04], who for any graph G showed how to variables with many useful properties. Strongly Rayleigh construct a another graph H with −2n poly log n edges s.t. distributions satisfy useful negative dependence properties, (1 − )LG LH (1 + )LG, which we refer to as an - and retain these properties under natural conditioning opera- spectral sparsifier. The construction was refined by Spielman tions. Strongly Rayleigh distributions also satisfy a powerful and Srivastava [SS11], who suggested sampling edges inde- stability property under conditioning known as Stochastic pendently1 with probability proportional to their leverage Covering [PP14], which is useful for analyzing them through scores, and brought the number of required samples down martingale techniques. A measure on {0, 1}n is said to be to −2n log n. This analysis is tight in the sense that if fewer Strongly Rayleigh if its generating polynomial is real stable than o(−2n log n) samples are used, there will be at least a / n −2n log n [BBL09]. There are many interesting examples of Strongly 1 poly( ) probability of failure. Meanwhile, log log n Rayleigh distributions [PP14]: The example mentioned ear- independent samples in a union of cliques can be shown lier of heads of independent coin flips conditional on the whp. to fail to give a cut sparsifier. This can be observed total number of heads in the outcome; symmetric exclusion directly from the degree distribution of a single vertex in processes; determinental point processes and determinental the complete graph. For a variant of [SS11] sampling based measures on a boolean lattice. An example of particular on flipping a single coin for each edge to decide whether to interest to us is the edges of uniform or weighted random keep it or not, it can also be shown that when the expected −2n log n spanning trees, which form a Strongly Rayleigh distribution. number of edges is log log n , whp. the procedure fails We prove a Matrix Chernoff bound for the case of k- to give a cut sparsifier. For arbitrary sparsification schemes, homogeneous Strongly Rayleigh distributions. Our bound is bounds in [BSS12] show that Θ(−2n) edges are necessary slightly weaker than the bound for independent variables, but and sufficient to give an -spectral sparsifier. we can show that it is tight in some regimes. We use our The marginal probability of an edge being present in a bound to show new concentration results related to random random spanning tree is exactly the leverage score of the spanning trees of graphs. An open question is to find other edge. This seems to suggest that combining −2 poly log n interesting applications of our concentration result, e.g. by spanning trees might give a spectral sparsifier, but the lack of analyzing concentration for matrices generated by exclusion independence between the sampled edges means the process processes. cannot be analyzed using existing techniques. Observing Random spanning trees are one among the most well- this, Baston, Spielman, Srivastava, Teng [BSST13] in their studied probabilistic objects in graph theory, going back to excellent 2013 survey on sparsification noted that “it re- the work of Kirchoff [Kir47] in 1847, who gave formula mains to be seen if the union of a small number of random relating the number of spanning trees in a graph to the spanning trees can produce a spectral sparsifier.” We answer determinant of the Laplacian of the graph. this question in the affirmative. In particular, we show that Algorithms for sampling of random spanning trees have adding together O(−2 log2 n) spanning trees with edges been studied extensively, [Gue83], [Bro89], [Ald90], scaled proportional to inverse leverage scores in the original [Kul90], [Wil96], [CMN96], [KM09], [MST15], [HX16], graph leads to a -spectral sparsifier. This matches the bound [DKP+17], [DPPR17], [Sch18], and a random spanning tree obtained for cut sparsifiers in [FHHP11]. Our result also im- can now be sampled in almost linear time [Sch18]. plies their earlier bound since a spectral sparsifier is always In theoretical computer science, random spanning trees a cut sparsifier with the same approximation quality. Before have found a number of applications, most notably in our result, only a trivial bound on the number of spanning breakthrough results on approximating the traveling sales- trees required to build a spectral sparsifier was known. person problem with symmetric [GSS11] and asymmetric In particular standard matrix concentration arguments like costs [AGM+10]. Goyal et al. [GRV09] demonstrated that those in [SS11] prove that O(−2n log n) spanning trees adding just two random spanning trees sampled from a suffice. Lower bounds in [FHHP11] show that whp. Ω(log n) bounded degree graph gives a O(log n) cut sparsifier with random spanning trees are required to give a constant factor probability 1 − o(1). Later, it was shown by Fung, Har- spectral sparsifier. −2 log n iharan, Harvey, Panigraphi [FHHP11], that if we sample We show that whp. log log n random spanning trees O(−2 log2 n) random spanning trees from a graph, reweight do not give an -spectral sparsifier. We also show that the the tree edges by the inverse of their leverage scores in the Laplacian of a single random tree with edges weighted as original graph, and average them together, then whp. we get a graph where every the weight of edges crossing every cut 1[SS11] analyzed sampling with replacement, but based on [Tro12], a folklore result shows the same behavior can be obtained by doing is approximately the same in as in the original graph, up independent coin flips for every edge with low leverage score, again with to a factor (1 ± ). We refer to this as an -cut sparsifier. inclusion probabilities proportional to leverage scores.

374 above satisfies LT O(log n)LG whp., and we give an sequence of increasingly restricted random choices in a almost matching lower bound, showing that in some graphs martingale process lead to a log n factor in a variance bound. log n LT O LG whp. ( log log n ) . Before our work, the main Strongly Rayleigh Distributions in Theoretical Com- result known about approximating graphs using O(1) ran- puter Science.: Perhaps the most prominent result on dom spanning trees is due to Goyal, Rademacher, Vempala Strongly Rayleigh distributions in theoretical computer sci- [GRV09], who showed that surprisingly, when the original ence is the generalization of [MSS13] to Strongly Rayleigh graph has bounded degree, adding two random spanning distributions. trees gives a graph whose cuts approximate the cuts in the The central technical result of [MSS13] essentially original graph up to a factor O(log n) with good probability. shows that given a collection of independent ran- v ,...,v Cn Our result for a single tree establishes only a one-sided dom vectors 1 m with finite support in s.t. m E v v∗ I i v 2 ≤ bound an interesting open question remains: Does sampling i=1 [ i i ]= and√ for all , i , then O n m v v∗≤ 2 > (1) random spanning trees give a log -factor spectral Pr[ i=1 i i (1 + ) ] 0.[AG15] establishes a sparsifier with, say, constant probability? related result for k-homogeneous Strongly Rayleigh dis- tributions, though they require an additional constraint on A. Previous work the marginal probability that any given Chernoff-type bound for matrices.: Chernoff-like is non-zero being bounded above by δ, and then estab- m v v∗≤ δ δ 2 > bounds for matrices appear in Rudelson [Rud99] and lish Pr[ i=1 i i 4( + )+2( + ) ] 0. Based Ahlswede and Winter [AW02]. The latter introduced a useful on this, [AG15] shows2 that given an unweighted k-edge matrix-variant of the Laplace transform that is central in con- connected graph G where every edge has leverage score centration bounds for scalar-valued matrices. Their bounds at most , there exists an unweighted spanning tree s.t. 1 restricted to iid random matrices, an artifact of their use LT O( k + ) · LG. This is referred to as a spectrally 1 of the Golden-Thompson inequality for bounding traces. In thin tree with parameter O( k + ). contrast, Tropp obtained more flexible concentration bounds [AGR16] showed how to algorithmically sample from for random matrices by using a result of Lieb to bound k-homogeneous Determinental Point Process in time the expected trace of various operators [Tro12], including poly(k)n log(n/), where n is the dimension of matrix bounds for matrix martingales [Tro11b]. giving rise to the determinental point process and is the In a recent work by Garg, Lee, Song and Srivastava allowed total variation distance. Their techniques are based [GLSS18], they show a Chernoff bound for sums of matrix- on generalization proofs of expansion in the base graph valued random variables sampled via a on associated with a balanced matroid, a result first established an expander graph. This work confirms a conjecture due by [FM92]. Wigderson and Xiao. The proof of Garg et al. is also Random spanning trees.: Algorithms for sampling ran- concerned with matrices that are not fully independent. In dom spanning trees have a long history, but only recently this case the matrices are generated from random walks on have they explicitly used matrix concentration [DKP+17], an expander graph. The main idea to deal with dependence [DPPR17], [Sch18]. The matrix concentration arguments in issue is using a new multi-matrix extension of the Golden- these papers, however, deal mostly with how modifying Thompson inequality and an adaptation of Healy’s proof of a graph results in changes to the distribution of random the expander Chernoff bound in the scalar’s case [Hea08] spanning trees in the graph. We instead study how closely to matrix case. Their techniques deal with fairly generic random spanning trees resemble the graph they were initially types of dependence, and cannot leverage the very strong sampled from. Whether our result in turn has applications for stability properties that arise from the negative dependence improving sampling algorithms for random spanning trees is and stochastic covering properties of Strongly Rayleigh unclear. distributions. Harvey and Olver [HO14] proved a matrix The fact that spanning tree edges exhibit negative depen- concentration result for randomized pipage rounding, which dence has been used strikingly in concentration arguments can be used to show concentration results for random by Goyal et al. [GRV09] to show that two random spanning spanning trees obtain from pipage rounding, but not for trees gives O(log n)-factor approximate cut sparsifier in (weighted) uniformly random spaning trees. The central bounded degree graphs, with good probability. This is technical element of their proof is a new variant of a theorem clearly false when sampling the same number of edges of Lieb on concavity of certain matrix trace functions. independently, because this graph has large probability of Matrix martingales have played a central role in a num- having isolated vertices. Goyal et al. improve over indepen- ber of algorithmic results in theoretical computer science dent sampling by leveraging the fact that for a fixed tree, in [KS16], [CMP16], [KPPS17], but beyond a reliance on some sense, very few cuts of a given size exist. This is a Tropp’s Matrix Freedman Inequality, these works have little variant of Karger’s famous cut-counting techniques [Kar93], in common with our approach. However, our bound does share a technical similarity with [KS16], namely that a 2Their full statement is more general, see [AG15] Corollary 1.9.

375 [KS96] specialized to unweighted trees. Remark I.2. When the Strongly Rayleigh distribution is Uses of negatively dependent Chernoff bounds applied in fact a product distribution on a collection of l Strongly to tree edges also appeared in works on approximation Rayleigh distributions that are each t-homogeneous, then algorithms for TSP problems [GSS11], [AGM+10], where the joint distribution is lt-homogeneous Strongly Rayleigh, additionally the connectivity properties of the tree play an but in Theorem I.1, the factor log(lt) can be replaced by a important role factor log t. This applies to the case of independent random In contrast, the techniques of Fung et al. [FHHP11] show spanning trees, but only gives a constant factor improvement that O(−2 log2 n) spanning trees suffice to give a (1 ± )- in the number of trees required. cut sparsifier, but they do not show that tree-based sparsifiers Our work is related to the of improve over independent sampling, The focus of their paper Peres and Pemantle [PP14], who showed a concentration is to establish that wide range of different techniques for result for scalar-valued Lipschitz functions of Strongly choosing sampling probabilities all give cut sparsifiers, by Rayleigh distributions. They used Doob martingales (mar- establishing a more flexible framework than the original cut- tingales constructed from sequences of conditional expec- sparsifier results of Benczur-Karger [BK96], using related tations) to prove their result. We use a similar approach cut-counting techniques (see [Kar93], [KS96]). To extend for matrices, constructing Doob matrix martingales from their results to spanning trees, they simply observe that the our Strongly Rayleigh distributions. In addition, we use the (scalar-valued) Chernoff bounds use immediately apply to stochastic covering property of Strongly Rayleigh distribu- negatively dependent variables, and hence edges in spanning tions observed by Peres and Pemantle, but implicitly derived trees. in [BBL09]. This property leads to bounded differences in Fung et at. [FHHP11] also establish a lower bound, Doob martingale sequences for scalars. As in the scalar set- showing that for any constant c, there exists a graph for ting, it is possible to show concentration results for matrix- which obtaining a factor c-cut sparsifier by averaging trees valued martingales. We use the Matrix Freedman inequality3 requires using at least n trees to succeed with con- Ω(log ) of Tropp. This inequality allows makes it possible to estab- stant probability. lish strong concentration bounds based on control of sample norms and control of the predictable quadratic variation B. Our results and techniques process of the martingale, a matrix-valued object that is used Theorem I.1. (First main result, a Matrix Chernoff Bound to measure variance (see [Tro11b]). We show that as in the k-homogeneous Strongly Rayleigh Distributions). Suppose scalar setting, the stochastic covering property of Strongly m (ξ1,...,ξm) ∈{0, 1} is a random vector of {0, 1} Rayleigh distributions leads to bounded differences for Doob variables whose distribution is k-homogeneous and Strongly matrix martingales. But, we also combine the stochastic Rayleigh. covering property with deceptively simple matrix martingale A ,...A ∈ Rn×n Given a collection of PSD matrices 1 m properties and a negative dependence condition to derive s.t. for all e ∈ [m] we have Ae≤R and E[ e ξeAe]≤ additional bounds on the predictable quadratic variation μ. process of the martingale. The key negative dependence Then for any >0, property we use is a simple observation that generalizes, to k-homogeneous Strongly Rayleigh distributions, the fact that in a random spanning tree, conditioning on the presence Pr ξeAe − E ξeAe ≥ μ of a set of edges lowers the marginal probability of every e e 2μ other graph edge being in the tree (see Lemma I.10). ≤ n − While we frame it differently, it is essentially an immediate exp R k Θ(1) (log + ) consequence of statements in [BBL09]. The surprise here is This Matrix Chernoff bound matches the bounds due how useful this simple observation is for removing issues k to Tropp [Tro12], up to the log k factor in the exponent. with characterizing conditional -homogeneous Strongly We know that the bound is essentially tight in some cases Rayleigh distributions. As a corollary we get our second ≈ log k k main result. when log log k , since if we could replace the log log k factor in Theorem I.1 with a factor log log k , then applying Theorem I.3. (Second main result, concentration bound of a this strengthened version to improve the concentration in batch of independent random spanning trees). Given as input Theorem I.5 would lead to a contradiction of the lower bound in Theorem I.6. We do not know if the theorem is 3Note, however, that we are able to prove deterministic bounds on the tight in other regimes, and it seems plausible that it should be predictable quadratic variation process, which means the bound we use possible to improve Theorem I.3 to show that O(−2 log n) is more analogous to a matrix version Bernstein’s inequality, adapted to martingales. We resort to the more complicated Freedman’s inequality suffice to get an -spectral sparsifier. This perhaps suggests only because it gives a directly applicable statement that is known in the there are regimes where our Theorem I.1 is not tight. literature.

376 a weighted graph G with n vertices and a parameter > s.t. if we sample an inverse leverage score weighted random −n.99 0, let T1,T2, ··· ,Tt denote t independent inverse leverage spanning tree T , then with probability at least 1 − e , score weighted random spanning trees, if we choose t = LT O(log n/ log log n) · LG. O(−2 log2 n) then with probability 1 − 1/ poly(n), Trivially, the presence of degree one nodes in LT means t 1 that in a complete graph, LG cLT for any c<1.So (1 − )LG LT (1 + )LG. t i choosing any other scaling of the tree will make at least one i=1 of the inequalities LT O(log n/ log log n) · LG and LG Prior to our work, only a trivial bound on the number 0.99LT true with a larger gap. Note that in the complete of spanning trees required to build a spectral sparsifier was unweighted graph the trees we consider have weight Θ(n) known, namely that standard matrix concentration arguments on each edge. A random√ spanning tree in the complete graph like those in [SS11] prove that O(−2n log n) spanning trees has diameter about n [RS67]. This can be shown to imply √ suffice. Note that, the number of spanning trees required that for an unweighted random tree T , LG nLT. But to build spectral sparsifier in our Theorem I.3 matches the once we scale up every edge of the tree by a factor Θ(n), number of spanning trees required to construct cut sparsifier the diameter bound no longer directly implies a spectral gap in previous best result [FHHP11]. The total edge count we of the form LG αLT for some α. In a ring graph, we get − require is Θ( 2n log2 n), worse by a factor log n than the LG (n − 2)LT and LT LG. bound for independent edge sampling obtained in [SS11].It We can also show in general that LT O(log n) · LG, is not clear whether this factor in necessary. but with a much smaller probability. Remark I.4. Suppose we apply our Theorem I.5 to show Theorem I.7 (Lower bound for the concentration of one that any single random spanning tree satisfies LT random spanning tree). There is an unweighted graph G O(log n) · LG whp. This is tight up a log log n factor. , s.t. if we sample an inverse leverage score weighted Then, one can from use this to derive Theorem I.3 based random spanning tree T , then with probability at least − n n on a standard (and tight) Matrix Chernoff bound, and a 2 log log log ,

(laborious) combination of Doob martingales and stopping LT Ω(log n) · LG. time arguments similar to those found in [Tro12], [KS16]. This line of reasoning will lead to the same bounds as And we show a lower bound for -spectral sparsifiers for Theorem I.5. Thus, unless one proves more than just a norm random spanning trees. bound for each individual tree, it is not possible improve Theorem I.8 (Lower bound for the concentration of multiple n over our result, except for log log factors. random spanning trees). There is an unweighted graph G t Like the work of Fung et al. our results for spanning , s.t. for any accuracy parameter , if we sample = O −2 n trees do not improve over the independent case. Fung et al. ( log ) independent random spanning trees with edges weighted by inverse leverage score, then with probability at achieved their result by combining cut counting techniques .99 − e−n with Chernoff bounds for scalar-valued negatively dependent least 1 , variables. In our random matrix setting, there are no clear t t − L 1 L 1 L L . candidates for a Chernoff bound for negatively dependent (1 ) G t Ti and t Ti (1 + ) G random matrices that we can adopt, and this type of bound i=1 i=1 is exactly what we develop in the Strongly Rayleigh case. Our lower bound is incomparable with that of Fung We establish a one-sided concentration result for a single et al.[FHHP11], who showed that for any constant c, there c tree, namely that whp. LT O(log n) · LG. Again, this is exists a graph for obtaining a factor -cut sparsifier by a direct application of Theorem I.1. averaging trees requires using at least Ω(log n) trees to suc- ceed with constant probability. Where Fung et. al [FHHP11] Theorem I.5. (Third main result, upper bound for the used triangles in their lower bound construction, our bad concentration of one random spanning tree). Given a graph examples are based on collections of small cliques, which G T , let be a random spanning tree, then with probability lets us ensure cut differences in even a single tree, by giving − / n at least 1 1 poly( ) longer-tailed degree distributions. All of our lower bounds are based on simple constructions from collections of edge LT O(log n) · LG. disjoint cliques, and use the fact that the exact distribution of degrees in a random spanning tree of the complete graph This upper bound is tight up to a factor log log n as shown is known. Note that a lower bound for cut approximation by our almost matching lower bound stated below. implies a lower-bound for spectral approximation, because Theorem I.6 (Lower bound for the concentration of one the contrapositive statement is true: spectral approximation random spanning tree). There is an unweighted graph G, implies cut approximation.

377 Remark I.9. In fact, all our lower bounds also directly We defer the proofs of our other results to the full version. apply for cut approximation, which is a strictly stronger result. For example, there is an unweighted graph G, s.t. II. NOTATION if we sample an inverse leverage score weighted random n { , , ··· ,n} x .99 We use [ ] to denote set 1 2 . Given a vector , spanning tree T , then with probability at least − e−n , x 1 we use 0 to denote the number of non-zero entries in T has a cut which is larger than the corresponding cut in the vector. G by a factor Ω(log n/ log log n). Matrix and Norms.: For matrix A,weuseA to denote A A Connection to Spectrally Thin Trees: Using their MSS- the transpose of . We say matrix is positive-semidefinite A A xAx ≥ x ∈ R , type existence proof for “small norm outcomes” of homoge- (psd) if = and 0 for all .Weuse A neous Strongly Rayleigh distributions, [AG15] showed that to denote the semidefinite ordering, e.g. 0 means that A in an unweighted k-edge connected graph G where every is psd. A ∈ Rn×n A edge has leverage score at most , there exists an unweighted For matrix , we define to be the spectral 1 norm of A, i.e., spanning tree T s.t. LT O( k + ) · LG. This is referred 1  to a spectrally thin tree with parameter O( k + ). A x Ax. =maxn In contrast, applying our k-homogeneous Strongly x2=1,x∈R Rayleigh Matrix Chernoff bound to an unweighted graph Let tr(A) denote the trace of a square matrix A. We use G where every edge has leverage score at most , we can λmax(A) to denote the largest eigenvalue of matrix A.For show that an unweighted random spanning tree satisfies symmetric matrix A ∈ Rn×n, λ A A and A ≤ L O n L max( )= tr( ) T ( log ) G with high probability. This follows nA. immediately from our Theorem I.5, because if we let T The Laplacian matrix related definitions.: Let G = denote the unweighted spanning tree and T corresponding (V,E,w) be a connected weighted undirected graph with spanning tree with edges weighted by inverse leverage n vertices and m edges and edge weights we > 0.Ifwe scores, then orient the edges of G arbitrarily, we can write its Laplacian  L BWB B ∈ Rm×n LT = w(e)bebe as = , where is the signed edge-vertex e∈T incidence matrix and defined as follows ⎧ 1 w e b b ⎨⎪1, if v is e’s head; l e ( ) e e e∈T ( ) B e, v − , v e ( )=⎪ 1 if is ’s tail; L ⎩ = T 0, otherwise. O n L ( log ) G m×m and W ∈ R is the diagonal matrix with W (e, e)=we. whp. The proof in [AG15] is based on an adaptation of the III. PRELIMINARIES [MSS13] proof, and does not have clear parallels with our A. Useful facts and tools approach. Whereas the key properties of Strongly Rayleigh This section, we provide some useful tools. For complete- distributions that we use are stochastic covering (a property ness, we prove the following statement in the full version. that limits change in a distribution under conditioning) and conditional negative dependence, the central element of Fact III.1. For any two symmetric matrices their approach is a proof that certain mixed characteristic A − B 2 A2 B2. polynomials associated with k-homogeneous distributions ( ) 2 +2 are real stable when the original distribution is Strongly B. Strongly Rayleigh distributions Rayleigh. The following Lemma captures a simple but crucial prop- This section provides the definition of Strongly Rayleigh erty of Strongly Rayleigh distributions. distributions. For more details, we refer the readers to [BBL09], [PP14]. n Lemma I.10 (Shrinking Marginals). Suppose Let μ :2[ ] → R≥ denote a probability distribution over m 0 ξ ,...,ξm ∈{, } { , } [n] ( 1 ) 0 1 is a random vector of 0 1 2 , and S⊆ n μ(S)=1. k [ ] variables whose distribution is -homogeneous and Let x1,x2, ··· ,xn denote n variables, we use x to denote Strongly Rayleigh, then any set S ⊆ [m] with |S|≤k for x ,x , ··· ,x S ⊆ n x (1 2 n). For each set [ ], we define S = j ∈ m \ S all [ ] i∈S xi. We define the generating polynomial for μ as ξ |ξ 1 ≤ ξ . follows Pr[ j =1 S = S] Pr[ j =1] f x μ S · x . We provide the proof of this lemma in the full version. In μ( )= ( ) S the remaining sections, we prove Theorems I.1, I.3 and I.5. S⊆[n]

378 We say distribution μ is k-homogeneous if the polynomial We refer to DG as the w-uniform distribution on TG. fμ is a homogeneous polynomial of degree k. In other When the graph G is unweighted, this corresponds to the words, for each S ∈ supp(μ), |S| = k. uniform distribution on TG. Crucially, random spanning tree We say a polynomial p(x1,x2, ··· ,xn) is stable, if distributions are Strongly Rayleigh, as shown in [BBL09]. Im(xi) > 0, ∀i ∈ [n], then p(x , ··· ,xn) =0 . We say 1 Fact III.5 (Spanning Trees are Strongly Rayleigh). In a polynomial p is real stable, it is stable and all of its connected weighted graph G, the w-uniform distribution on coefficients are real. We say μ is a Strongly Rayleigh spanning trees is (n − 1)-homogeneous Strongly Rayleigh. distribution if fμ is a real stable polynomial. Definition III.6 (Effective Resistance). The effective resis- Fact III.2 (Conditioning on subset of coordinates). Consider m tance of a pair of vertices u, v ∈ VG is defined as a random vector (ξ1,...,ξm) ∈{0, 1} whose distribution k  † is -homogeneous Strongly Rayleigh. Suppose we get a R (u, v)=bu,vL bu,v, t eff binary vector b =(b ,...,bt) ∈{0, 1} with b = l ≤ k, 1 0 b V and we get a set S ⊂ [m] with |S| = t. Then conditional on where u,v is an all zero vector corresponding to G, except ξ b ξ k − l for entries of 1 at u and v. S = , the distribution of [m]\S is ( )-homogeneous Strongly Rayleigh. The a reference for following standard fact about random + This fact tells us that if we condition on the value of some spanning trees can be found in [DKP 17]. entries in the vector, the remaining coordinates still have a Definition III.7 (Leverage Score). The statistical leverage Strongly Rayleigh distribution. score, which we will abbreviate to leverage score, of an edge Fact III.3 (Stochastic Covering Property). Consider a ran- e =(u, v) ∈ EG is defined as m dom vector (ξ ,...,ξm) ∈{0, 1} whose distribution is 1 le = weR (u, v). k-homogeneous Strongly Rayleigh. Suppose we are given eff i ∈ m ξ ξ an index [ ]. Let = [m]\{i} be the distribution on Fact III.8 (Spanning Tree Marginals). The probability Pr[e] ξ i ξ ξ entries of except . Let be the distribution of [m]\{i} that an edge e ∈ EG appears in a tree sampled w-uniformly conditional on ξi =1. Then, there exists a coupling between randomly from TG is given by ξ and ξ (i.e. a joint distribution the two vectors), s.t. in  e l , every outcome of the coupling the value of ξ can be obtain Pr[ ]= e from the value of ξ by either changing a single from 0 to where le is the leverage score of the edge e. 1 or by leaving all entries unchanged. IV. A MATRIX CHERNOFF BOUND FOR STRONGLY This fact is known as the Stochastic Covering Property RAYLEIGH DISTRIBUTIONS (see [PP14]). It gives us a convenient tool for relating the conditional distribution of a subset of the coordinates of the We first define a mapping which maps an element into a vector to the unconditional distribution. psd matrix. ξ Note that by Fact III.2, the distribution of used in Definition IV.1 (Y -operator). We use Γ to denote [m],we Fact III.3 is k − 1 homogeneous. In contrast, the outcomes n×n define a mapping Y :Γ→ R such that Ye is a psd of ξ may have k or k−1 ones. Fact III.3 tells us that we can matrix and Ye≤R. pair up all the outcomes of the conditional distribution ξ m with outcome of the unconditional distribution ξ s.t. only Throughout this section, we will use ξ ∈{0, 1} to a small change is required to make them equal. This tells denote a random length m boolean vector whose distribution us that the distribution is in some sense not changing too is k-homogeneous Strongly Rayleigh. For any set S ⊆ [m], quickly under conditioning. we use ξS to denote the length |S| vector that only chooses the entry from indices in S. C. Random spanning trees We will frequently need to work with a different repre- We provide the formal definition of random spanning tree sentation of the random variable ξ. We use γ to denote this in this section. second representation. The random variable γ is composed We use the same definitions about spanning trees as of a sequence of k random indices γ1,γ2, ··· ,γk, each of + [DKP 17]. Let TG denote the set of all spanning subtrees of which takes a value e1,e2, ··· ,ek ∈ [m]. The indices give G. We now define a probability distribution on these trees. the locations of the ones in ξ, i.e. in an outcome of the ξ,γ ξ 1 w D two variables ( ), we always have {γ1,γ2,··· ,γk} = and Definition III.4 ( -uniform distribution on trees). Let G ξ 0 T [m]\{γ1,γ2,··· ,γk} = . Additionally, we want to ensure that be a probability distribution on G such that γ the distribution of is invariant under permutation: This Pr [X = T ] ∝ we. can clearly be achieved by starting with any distribution X∼D G e∈T for γ that satisfies the coupling with ξ and the applying

379 a uniformly random permutation to reorder the k indices of covering property implies that coupling is exists between γ Z Y (see [PP14] for a further discussion). ≥i+2 and γi+1 , thus For convenience, for each i ∈ [k], we define γ≤i and γ≥i Z Y Z . as abbreviated notation for ≥i+2 + γi+1 = ≥i+1 (3)

We define Xi+1 = Mi+1 − Mi and then γ1,γ2, ··· ,γi and γi,γi+1, ··· ,γk E [Xi+1|γ1, ··· ,γi]=0. (4) respectively. Let S = {e1,e2, ··· ,ei}⊂[m] be one possible γi+1 assignment for indices of a subset of the ones in ξ, (we Then we can rewrite Xi in the following way, i ≤ k ξ +1 require ). Then the distribution of [m]\S conditional ξ 1 ξ Dγ on S = is the same as the the distribution of [m]\S Claim IV.4. Let ≤i+1 denote the coupling distribution Z Y conditional on (γ1,γ2, ··· ,γi)=(e1,e2, ··· ,ei). In other between ≥i+2 and γi+1 such that Eq. (3) holds. Then ξ words, in terms of the resulting distribution of [m]\S,itis X Y − E Y | γ . i+1 = γi+1 [ γi+1 ≤i+1] γ ξ 1 Z ,Y ∼D equivalent to condition on either ≤i or γ≤i = . We define ( ≥i+2 γi+1 ) γ≤i+1 matrix Z ∈ Rn×n as follows. We provide the proof of the above Claim in the full Z Z ξ · Y Definition IV.2 ( ). Let denote e∈Γ e e where version. ξ k ξ γ 0 = . Due to the relationship between and ,we γ Z Fact IV.5. We conditioning on ≤i+1 are already fixed. Let can also write as D Z γ≤i+1 denote the coupling distribution such that ≥i+2 + k Y Z U γi+1 = ≥i+1 holds. We define γi+1 as follows Z = Yγi . i U E Y | γ . =1 γi+1 = [ γi+1 ≤i+1] Z ,Y ∼D ( ≥i+2 γi+1 ) γ≤i+1 For simplicity, for each i ∈ [k], we define Z≤i and Z≥i as follows, Then, we have the following four properties, i k E U | γ E Y | γ , (I) [ γi+1 ≤i]= [ γi+1 ≤i] γi+1 γi+1 Z≤i = Yγ and Z≥i = Yγ . j j Y ≤R, U ≤R, j=1 j=i (II) γi+1 γi+1 n×n (III) Yγ − Uγ ≤R, We define a series of matrices Mi ∈ R as follows i+1 i+1 Y 2 R · Y ,U2 R · U . (IV) γi+1 γi+1 γi+1 γi+1 Definition IV.3 (Mi, martingale). We define M0 = E[Z]. For each i ∈{1, 2, ··· ,k− 1}, we define Mi as follows We provide the proof of the above Fact in the full version. We can show Mi = E [Z | γ1, ··· ,γi]. γ≥i+1 Claim IV.6. It is easy to see that E Y − U 2 | γ R · E Y |γ ( γi+1 γi+1 ) ≤i 4 [ γi+1 ≤i] γi+1 γi+1 E [Mi+1]=Mi, γi+1 Proof: which implies E Y − U 2 | γ ( γi+1 γi+1 ) ≤i γi+1 E [Mi+1 − Mi | γ1, ··· ,γi]=0. γi+1 E Y 2 U 2 | γ 2 γi+1 +2 γi+1 ≤i γi+1 We can rewrite the Z in Mi+1 in the following sense, E 2R · Yγ +2R · Uγ | γ≤i γ i+1 i+1 k i i+1 E R · Y | γ , Z Yγ Yγ Yγ 4 γi+1 ≤i = j + i+1 + j γi+1 j=i+2 j=1 Z Y Z . where the first step follows by Fact III.1, and the second = ≥i+2 + γi+1 + ≤i (1) U 2 R · U Y 2 R · Y step follows by γi+1 γi+1 and γi+1 γi+1 . For the Z in Mi, we can write it as k i Lemma IV.7. Let E[ e∈ ξeYe] μI. For each i ∈ Z Y Y Γ = γj + γj {1, 2, ··· ,k}, we have j=i+1 j=1 E Y | γ 1 μI. = Z≥i + Z≤i. (2) [ γi ≤i−1] +1 γi k +1− i Note that k-homogeneous Strongly Rayleigh implies the Proof: We use 1 to a length i − 1 vector where each stochastic covering property. By [BBL09], [PP14], stochastic entry is one. We can think of γ≤i−1 are already chosen to

380 some e ∈ Γ, for example γ1 = e1, ··· ,γi−1 = ei−1. Note the difference sequence. Assume that the difference sequence that, all of e1, ··· ,ei−1 are different. Then we use Γ\γ≤i−1 is uniformly bounded in the sense that \{e , ··· ,e } to denote Γ 1 i−1 . λ X ≤ R, i , , , ··· . max( i) almost surely for =1 2 3 E Y | γ γ e | γ · Y [ γi ≤i−1]= Pr[ i = ≤i−1] e Define the predictable quadratic variation process of the e∈Γ\γ≤i−1 martingale : Pr[ξe =1| γ≤i−1] i = · Ye k − i − W E X2 , i , , , ··· . e∈ \γ ( 1) i = [ j ] for =1 2 3 Γ ≤i−1 j−1 ξ | ξ 1 j=1 Pr[ e =1 γ≤i−1 = ] = · Ye t ≥ σ2 > k − (i − 1) Then, for all 0 and 0, e∈Γ\γ≤i−1 ∃i ≥ λ Y ≥ t W ≤σ2 ξ Pr 0: max( i) and i Pr[ e =1] · Y k − i − e t2/2 ( 1) ≤ n · exp − . e∈Γ\γ≤i−1 σ2 Rt/ + 3 Pr[ξe =1] · Ye Now, we are ready to prove our main theorem, k − (i − 1) e∈Γ Theorem I.1. (First main result, a Matrix Chernoff Bound 1 E ξ Y k-homogeneous Strongly Rayleigh Distributions). Suppose = k − i − e e ξ ,...,ξ ∈{, }m { , } ( 1) e ( 1 m) 0 1 is a random vector of 0 1 variables whose distribution is k-homogeneous and Strongly 1 μI k +1− i Rayleigh. A ,...A ∈ Rn×n Given a collection of PSD matrices 1 m where the first step follows by definition of expectation, the s.t. for all e ∈ [m] we have Ae≤R and E[ ξeAe]≤ γ e | γ ξ e second step follows by Pr[ i = ≤i−1]=Pr[e = μ. | γ / k − i − ·|γ 1 ≤i−1] ( ( 1)), the third step follows [ ≤i−1] Then for any >0, ·|ξ 1 is equivalent to [ γ≤i−1 = ], the fourth step follows by ξ | ξ 1 ≤ ξ (Pr[ e =1 γ≥i−1 = ] Pr[ e =1]) from Shrinking ξ A − E ξ A ≥ μ Pr e e e e Marginals Lemma I.10, the fifth step follows by relaxing e e \γ ξ E ξ Γ ≤i−1, the sixth step follows by Pr[ e =1]= [ e] 2μ ≤ n − and linearity of expectation, and the last step follows by exp R k Θ(1) E ξ Y μI (log + ) [ e∈Γ e e] . Proof: We use Y to denote A and Γ to denote [m]. In order to use Theorem IV.10, we first we define Wi as i ∈{ , , ··· ,k} Lemma IV.8. For each 1 2 follows i E X2 | γ μR 1 I. i ≤i−1 4 W E X2 | γ . γi k − i i = i ≤i−1 +1 γi j=1 Proof: It follows by combining Claim IV.6 and M {M ,M ,M ···} Lemma IV.7 directly. According to definition of i, 0 1 2 is a matrix martingale and Mk − M ξeAe − E ξeAe . The above lemma implies this corollary directly 0 = e [ e ] We have proved the following facts, E X |γ Corollary IV.9. The first one is, γi [ i ≤i−1]=0. It follows by Eq. (4) The second one is k E X2 | γ μR k · I. i ≤i−1 10 log λ (Xi) ≤ R γi max i=1 It follows by combining Property (III) of Fact IV.5 and A. Main result Claim IV.4. Before finally proving our main theorem I.1, we state a The third one is useful tool: Freedman’s inequality for matrices 2 Wi≤σ , ∀i ∈ [k] We state a version from [Tro11b], and there is also another version can be found in [Oli09]. where σ2 =10μR log k. It follows by Corollary IV.9. Thus, Theorem IV.10 (Matrix Freedman). Consider a matrix mar- {Y i , , , ···} (μ)2/2 tingale i : =01 2 whose values are self-adjoint Pr [λ (Mk − M ) ≥ μ] ≤ n exp − . max 0 σ2 R μ / matrices with dimension n, and let {Xi : i =1, 2, 3, ···}be + ( ) 3

381 We have As LT is a Laplacian, it has 1 in the null space, so L† 1/2L L† 1/2 n t2/2 2μ2/2 can conclude that ( G) T ( G) 100 log Π. Hence = by choosing t = μ LT log nLG. σ2 + Rt/3 10μR log k + Rμ/3 32μ = . (60 log k +2)R Theorem I.3. (Second main result, concentration bound of a batch of independent random spanning trees). Given as input Thus we prove one side of the bound. Since a weighted graph G with n vertices and a parameter > Eγ −Xi|γ≤i− Eγ −Xi 2|γ≤i− i [ 1]=0and i [( ) 1]= 0, let T ,T , ··· ,Tt denote t independent inverse leverage E X2|γ 1 2 γi [ i ≤i−1], then following the similar procedure score weighted random spanning trees, if we choose t = λ λ as proving max, we have bound for min O −2 2 n − / n ( log ) then with probability 1 1 poly( ), 2μ λ M − M ≤−μ ≤ n − 3 . t Pr[ min( k 0) ] exp k R − L 1 L L . (60 log +2 ) (1 ) G t Ti (1 + ) G i=1 Putting two sides of the bound together, we complete the proof. Proof: The proof is similar to the proof of Theorem I.5. Now we viewing edges of t = O(−2 log2 n) indepen- V. A PPLICATIONS TO RANDOM SPANNING TREES dent random spanning trees as a t(n − 1)-homogeneous { , }t|E| In this section, we show how to use Theorem I.1 to prove Strongly Rayleigh Distribution a vector in 0 1 . the bound for one random spanning and also summation of Note that the product of independent Strongly Rayleigh distributions is Strongly Rayleigh [BBL09]. Again we get random spanning trees. 1 || E [ e ξeAe] || =1, but now we can take R = t , and hence Theorem I.5. (Third main result, upper bound for the we obtain the desired result by plugging into Theorem I.1. concentration of one random spanning tree). Given a graph G, let T be a random spanning tree, then with probability at least 1 − 1/ poly(n) ACKNOWLEDGMENTS We thank Yin Tat Lee, Jelani Nelson, Daniel Spielman, LT O(log n) · LG. Aviad Rubinstein, and Zhengyu Wang for helpful discus- Proof: sions regarding lower bound examples. Let G =(V,E,w) be a undirected weighted graph, w : We want to acknowledge Michael Cohen, who recently E→ R, which is connected. The Laplacian of G is LG = passed away. In personal communication, Michael told Ras-  e∈E w(e)bebe . mus that he could prove a concentration result for averaging Let T ⊆ E be a random spanning tree of G in the sense of spanning trees, although losing several log factors. Un- of Definition III.4. Let the weights of the edges in T be fortunately, Michael did not relate the proof to anyone, but   given by w : T → R where w (e)=w(e)/le, where le did say it was not based on Strongly Rayleigh distributions. We hope he would appreciate our approach. is the leverage score of e in G. Thus the Laplacian of the   w(e)  LT w e beb beb We also want to acknowledge the Simons program “Bridg- tree is = e∈T ( ) e = e∈T le e . Then by Fact III.8, Pr[e ∈ T ]=le, and hence E[LT ]=LG. ing Continuous and Discrete Optimization” and Nikhil Sri- Note also that for all e ∈ E, vastava whose talks during the program inspired us to study † 1/2  † 1/2 Strongly Rayleigh questions. ||(LG) w(e)bebe (LG) || = le. Consider the random † 1/2 † 1/2 This work was done while Rasmus was a postdoc at matrix (LG) LT (LG) . The distribution of edge in the spanning tree can be seen as an n − 1 homogeneous vector Harvard, and Zhao was a visiting student at Harvard, both m in {0, 1} where m = |E|. To apply Theorem I.1, let ξe hosted by Jelani Nelson. Rasmus was supported by ONR be the eth entry of this random vector, and grant N00014-17-1-2127. † 1/2   † 1/2 REFERENCES Ae =(LG) w (e)bebe (LG) A ||A || E ξ A [AG09] Kook Jin Ahn and Sudipto Guha. Graph sparsification Note e 0.Now e =1and [ e e e]= in the semi-streaming model. In International Collo- † 1/2 † 1/2 † 1/2 † 1/2 E[(LG) LT (LG) ]=(LG) LG(LG) =Π= quium on Automata, Languages, and Programming, 1  I − n 11 , where we used in the last equality that the null pages 328–338. Springer, 2009. space of the Laplacian of a connected graph is the span of the [AG15] Nima Anari and Shayan Oveis Gharan. Effective- all ones vector. Thus, as each we get || E [ e ξeAe] || =1 R μ resistance-reducing flows, spectrally thin trees, and This means we can apply Theorem I.1 with =1, =1 asymmetric tsp. In Foundations of Computer Science n || L† 1/2L L† 1/2 − || ≤ and = 100 log to whp. ( G) T ( G) Π (FOCS), 2015 IEEE 56th Annual Symposium on, 100 log n. pages 20–39. IEEE, 2015.

382 [AGM+10] Arash Asadpour, Michel X. Goemans, Aleksander [CMN96] Charles J Colbourn, Wendy J Myrvold, and Eugene M˛adry, Shayan Oveis Gharan, and Amin Saberi. An Neufeld. Two algorithms for unranking arborescences. o(log n/ log log n)-approximation algorithm for the Journal of Algorithms, 20(2):268–281, 1996. asymmetric traveling salesman problem. In Proceed- ings of the Twenty-first Annual ACM-SIAM Symposium [CMP16] Michael B Cohen, Cameron Musco, and Jakub Pa- on Discrete Algorithms, SODA ’10, pages 379–389, chocki. Online row sampling. arXiv preprint Philadelphia, PA, USA, 2010. Society for Industrial arXiv:1604.05448, 2016. and Applied Mathematics. [Coh16] Michael B Cohen. Nearly tight oblivious subspace [AGR16] Nima Anari, Shayan Oveis Gharan, and Alireza embeddings by trace inequalities. In Proceedings of Rezaei. Monte carlo markov chain algorithms for sam- the twenty-seventh annual ACM-SIAM symposium on pling strongly rayleigh distributions and determinantal Discrete algorithms, pages 278–287. SIAM, 2016. point processes. In Conference on Learning Theory, + pages 103–115, 2016. [DKP 17] David Durfee, Rasmus Kyng, John Peebles, Anup B. Rao, and Sushant Sachdeva. Sampling random span- [Ald90] David Aldous. The random walk construction of ning trees faster than matrix multiplication. In Pro- uniform spanning trees and uniform labelled trees. In ceedings of the 49th Annual ACM SIGACT Symposium SIAM Journal on Discrete Mathematics, pages 450– on Theory of Computing, STOC 2017, pages 730–742, 465, 1990. New York, NY, USA, 2017. ACM.

[AW02] Rudolf Ahlswede and Andreas Winter. Strong con- [DPPR17] David Durfee, John Peebles, Richard Peng, and verse for identification via quantum channels. IEEE Anup B. Rao. Determinant-preserving sparsification Transactions on Information Theory, 48(3):569–579, of SDDM matrices with applications to counting and 2002. sampling spanning trees. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of [BBL09] Julius Borcea, Petter Brändén, and Thomas Liggett. Computer Science, FOCS ’17, 2017. Negative dependence and the geometry of polynomi- als. Journal of the American Mathematical Society, [FHHP11] Wai Shing Fung, Ramesh Hariharan, Nicholas JA Har- 22(2):521–567, 2009. vey, and Debmalya Panigrahi. A general framework for graph sparsification. In Proceedings of the forty- [Ber24] Sergei Bernstein. On a modification of chebyshev’s third annual ACM symposium on Theory of computing, inequality and of the error formula of laplace. Ann. pages 71–80. ACM, 2011. Sci. Inst. Sav. Ukraine, Sect. Math, 1(4):38–49, 1924. [FKG71] Cees M Fortuin, Pieter W Kasteleyn, and Jean Ginibre. [BK96] András A. Benczúr and David R. Karger. Approximat- Correlation inequalities on some partially ordered sets. ing s-t minimum cuts in Õ(n2) time. In Proceedings of Communications in Mathematical Physics, 22(2):89– the Twenty-eighth Annual ACM Symposium on Theory 103, 1971. of Computing, STOC ’96, pages 47–55, New York, NY, USA, 1996. ACM. [FM92] Tomás Feder and Milena Mihail. Balanced matroids. In Proceedings of the twenty-fourth annual ACM sym- [Bro89] Andrei Broder. Generating random spanning trees. posium on Theory of computing, pages 26–38. ACM, In Proceedings of the 30th annual Symposium on 1992. Foundations of Computer Science, FOCS 1989, pages 442–447, 1989. [GLSS18] Ankit Garg, Yin-Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander chernoff bound. In [BSS12] Joshua Batson, Daniel A Spielman, and Nikhil Srivas- STOC 2018. arXiv preprint arXiv:1704.03864, 2018. tava. Twice-ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704–1721, 2012. [GRV09] Navin Goyal, Luis Rademacher, and Santosh Vempala. Expanders via random spanning trees. In Proceed- [BSST13] Joshua Batson, Daniel A Spielman, Nikhil Srivas- ings of the twentieth annual ACM-SIAM symposium tava, and Shang-Hua Teng. Spectral sparsification of on Discrete algorithms, pages 576–585. Society for graphs: theory and algorithms. Communications of the Industrial and Applied Mathematics, 2009. ACM, 56(8):87–94, 2013. [GSS11] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. [Che52] Herman Chernoff. A measure of asymptotic efficiency A randomized rounding approach to the traveling for tests of a hypothesis based on the sum of obser- salesman problem. In Proceedings of the 2011 IEEE vations. The Annals of Mathematical Statistics, pages 52Nd Annual Symposium on Foundations of Computer 493–507, 1952. Science, FOCS ’11, pages 550–559, Washington, DC, USA, 2011. IEEE Computer Society. [CKP+17] Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Anup B Rao, Aaron Sidford, and [Gue83] Alain Guenoche. Random spanning tree. Journal of Adrian Vladu. Almost-linear-time algorithms for Algorithms, 4(3):214–220, 1983. markov chains and new spectral primitives for directed graphs. In Proceedings of the 49th Annual ACM [Hea08] Alexander D Healy. Randomness-efficient sampling SIGACT Symposium on Theory of Computing, pages within nc. Computational Complexity, 17(1):3–37, 410–419. ACM, 2017. 2008.

383 [HO14] Nicholas JA Harvey and Neil Olver. Pipage rounding, [Oli09] Roberto Imbuzeiro Oliveira. Concentration of the pessimistic estimators and matrix concentration. In adjacency matrix and of the laplacian in random Proceedings of the twenty-fifth annual ACM-SIAM graphs with independent edges. arXiv preprint symposium on Discrete algorithms, pages 926–945. arXiv:0911.0600, 2009. Society for Industrial and Applied Mathematics, 2014. [PP14] Robin Pemantle and Yuval Peres. Concentration of [HSSS16] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and lipschitz functionals of determinantal and other strong David Steurer. Fast spectral algorithms from sum- rayleigh measures. Combinatorics, Probability and of-squares proofs: tensor decomposition and planted Computing, 23(1):140–160, 2014. sparse vectors. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, [RS67] A Rényi and G Szekeres. On the height of trees. J. pages 178–191. ACM, 2016. Austral. Math. Soc, 7(4):497–5, 1967.

[HX16] Nicholas J. A. Harvey and Keyulu Xu. Generating [Rud99] Mark Rudelson. Random vectors in the isotropic random spanning trees via fast matrix multiplication. position. Journal of Functional Analysis, 164(1):60– In LATIN 2016: Theoretical Informatics, volume 9644, 72, 1999. pages 522–535, 2016. [Sch18] Aaron Schild. An almost-linear time algorithm for [Kar93] David R Karger. Global min-cuts in rnc, and other uniform random spanning tree generation. In Proceed- ramifications of a simple min-cut algorithm. In SODA, ings of the 50th Annual ACM SIGACT Symposium on volume 93, pages 21–30, 1993. Theory of Computing, STOC 2018, 2018.

[Kir47] Gustav Kirchhoff. Uber die auflosung der gliechun- [SS11] Daniel A Spielman and Nikhil Srivastava. Graph gen, auf welche man bei der undersuchung der lin- sparsification by effective resistances. SIAM Journal earen vertheilung galvanischer strome gefuhrt wird. on Computing, 40(6):1913–1926, 2011. Poggendorgs Ann. Phys. Chem., pages 497–508, 1847. [ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly- [KM09] Jonathan Kelner and Aleksander Madry. Faster gener- linear time algorithms for graph partitioning, graph ation of random spanning trees. In Proceedings of the sparsification, and solving linear systems. In Proceed- 50th annual Symposium on Foundations of Computer ings of the Thirty-sixth Annual ACM Symposium on Science, FOCS 2009, pages 13–21, 2009. Available Theory of Computing, STOC ’04, pages 81–90, New at https://arxiv.org/abs/0908.1448. York, NY, USA, 2004. ACM.

[KPPS17] Rasmus Kyng, Jakub Pachocki, Richard Peng, and [Tij] H Tijms. Understanding probability: Chance rules in Sushant Sachdeva. A framework for analyzing respar- everyday life. 2004. sification algorithms. In Proceedings of the Twenty- Eighth Annual ACM-SIAM Symposium on Discrete [Tro11a] Joel A Tropp. Freedman’s inequality for matrix mar- Algorithms, pages 2032–2043. SIAM, 2017. tingales. Electronic Communications in Probability, 16:262–270, 2011. [KS96] David R Karger and Clifford Stein. A new approach to the minimum cut problem. Journal of the ACM [Tro11b] Joel A Tropp. User-friendly tail bounds for matrix (JACM), 43(4):601–640, 1996. martingales. Technical report, CALIFORNIA INST OF TECH PASADENA, 2011. [KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for laplacians-fast, sparse, and [Tro12] Joel A Tropp. User-friendly tail bounds for sums simple. In Foundations of Computer Science (FOCS), of random matrices. Foundations of computational 2016 IEEE 57th Annual Symposium on, pages 573– mathematics, 12(4):389–434, 2012. 582. IEEE, 2016. [Wil96] David Bruce Wilson. Generating random spanning [Kul90] Vidyadhar G. Kulkarni. Generating random combina- trees more quickly than the cover time. In Proceedings torial objects. Journal of Algorithms, 11(2):185–207, of the Twenty-eighth Annual ACM Symposium on 1990. Theory of Computing, STOC ’96, pages 296–303, New York, NY, USA, 1996. ACM. [MSS13] Adam Marcus, Daniel A Spielman, and Nikhil Sri- vastava. Interlacing families ii: Mixed characteristic [ZSD17] Kai Zhong, Zhao Song, and Inderjit S Dhillon. polynomials and the kadison-singer problem. arXiv Learning non-overlapping convolutional neural net- preprint arXiv:1306.3969, 2013. works with multiple kernels. arXiv preprint arXiv:1711.03440, 2017. [MST15] Aleksander Madry, Damian Straszak, and Jakub Tarnawski. Fast generation of random spanning [ZSJ+17] Kai Zhong, Zhao Song, Prateek Jain, Peter L trees and the effective resistance metric. In Bartlett, and Inderjit S Dhillon. Recovery guaran- Proceedings of the Twenty-Sixth Annual ACM- tees for one-hidden-layer neural networks. In ICML. SIAM Symposium on Discrete Algorithms, SODA https://arxiv.org/pdf/1706.03175.pdf, 2017. 2015, pages 2019–2036, 2015. Available at http://arxiv.org/pdf/1501.00267v1.pdf.

384