A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a Few Random Spanning Trees

2018 IEEE 59th Annual Symposium on Foundations of Computer Science A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a few Random Spanning Trees Rasmus Kyng Zhao Song School of Engineering and Applied Sciences School of Engineering and Applied Sciences Harvard University Harvard University Cambridge, MA, USA Cambridge, MA, USA [email protected] [email protected] Abstract— usually being measured by spectral norm. Modern quantita- Strongly Rayleigh distributions are a class of negatively tive bounds of the form often used in theoretical computer dependent distributions of binary-valued random variables science were derived by for example by Rudelson [Rud99], [Borcea, Brändén, Liggett JAMS 09]. Recently, these distributions have played a crucial role in the analysis of algorithms for while Ahlswede and Winter [AW02] established a useful fundamental graph problems, e.g. Traveling Salesman Problem matrix-version of the Laplace transform that plays a central [Gharan, Saberi, Singh FOCS 11]. We prove a new matrix role in scalar concentration results such as those of Bern- Chernoff bound for Strongly Rayleigh distributions. stein. [AW02] combined this with the Golden-Thompson As an immediate application, we show that adding together − trace inequality to prove matrix concentration results. Tropp the Laplacians of 2 log2 n random spanning trees gives an (1 ± ) spectral sparsifiers of graph Laplacians with high refined this approach, and by replacing the use of Golden- probability. Thus, we positively answer an open question posted Thompson with deep a theorem on concavity of certain trace in [Baston, Spielman, Srivastava, Teng JACM 13]. Our number functions due to Lieb, Tropp was able to recover strong of spanning trees for spectral sparsifier matches the number versions of a wide range of scalar concentration results, of spanning trees required to obtain a cut sparsifier in [Fung, including matrix Chernoff bounds, Azuma and Freedman’s Hariharan, Harvey, Panigraphi STOC 11]. The previous best result was by naively applying a classical matrix Chernoff inequalities for matrix martingales [Tro12]. − bound which requires 2n log n spanning trees. For the tree Matrix concentration results have had an enormous range averaging procedure to agree with the original graph Laplacian of applications in computer science, and are ubiquitous in expectation, each edge of the tree should be reweighted by throughout spectral graph theory [ST04], [SS11], [CKP+17], the inverse of the edge leverage score in the original graph. sketching [Coh16], approximation algorithms [HSSS16], We also show that when using this reweighting of the edges, + the Laplacian of single random tree is bounded above in the and deep learning [ZSJ 17], [ZSD17]. Most applications are PSD order by the original graph Laplacian times a factor log n based on results for independent random matrices, but more with high probability, i.e. LT O(log n)LG. flexible bounds, such as Tropp’s Matrix Freedman Inquality We show a lower bound that almost matches our last result, [Tro11a], have been used to greatly simplify algorithms, namely that in some graphs, with high probability, the random e.g. for solving Laplacian linear equations [KS16] and spanning tree is not bounded above in the spectral order by log n for semi-streaming graph sparsification [AG09], [KPPS17]. log log n times the original graph Laplacian. − We also show a lower bound that in 2 log n spanning trees Matrix concentration results are also closely related to other are necessary to get a (1 ± ) spectral sparsifier. popular tools sampling tools, such as Karger’s techniques Keywords-Matrix Chernoff bounds; random spanning trees; for generating sparse graphs that approximately preserve the Strongly Rayleigh; Spectral sparsifier; cuts of denser graphs [BK96]. Negative dependence of random variables is an appealing I. INTRODUCTION property that intuition suggests should help with concen- The idea of concentration of sums of random variables tration of measure. Notions of negative dependence can be dates back to Central Limit Theorems, and hence de Moivre formalized in many ways. Roughly speaking, these notions and Laplace [Tij], while modern concentration bounds for characterize distributions where where some event occurring sums of random variables were perhaps first established by ensures that other events of interest become less likely. A Bernstein [Ber24], and a popular variant now known as simple example is the distribution of a sequence of coin flips, Chernoff bounds was introduced by Rubin and published conditioned on the total number of heads in the outcome. by Chernoff [Che52]. In this distribution, conditioning on some coin coming out Concentration of measure for matrix-valued random vari- heads makes all other coins less likely to come out heads. ables is the phenomenon that many matrix valued distribu- Unfortunately, negative dependence phenomena are not as tions are to close their mean with high probability, closeness robust as positive association which can be established from 2575-8454/18/$31.00 ©2018 IEEE 373 DOI 10.1109/FOCS.2018.00043 local conditions using the powerful FKG theorem [FKG71]. The techniques of Fung et al. unfortunately do not extend Strongly Rayleigh distributions were introduced recently to proving spectral sparsifiers. by Borcea, Brändén, and Liggett [BBL09] as a class of Spectral graph sparsifiers were introduced by Spielman negatively dependent distributions of binary-valued random and Teng [ST04], who for any graph G showed how to variables with many useful properties. Strongly Rayleigh construct a another graph H with −2n poly log n edges s.t. distributions satisfy useful negative dependence properties, (1 − )LG LH (1 + )LG, which we refer to as an - and retain these properties under natural conditioning opera- spectral sparsifier. The construction was refined by Spielman tions. Strongly Rayleigh distributions also satisfy a powerful and Srivastava [SS11], who suggested sampling edges inde- stability property under conditioning known as Stochastic pendently1 with probability proportional to their leverage Covering [PP14], which is useful for analyzing them through scores, and brought the number of required samples down martingale techniques. A measure on {0, 1}n is said to be to −2n log n. This analysis is tight in the sense that if fewer Strongly Rayleigh if its generating polynomial is real stable than o(−2n log n) samples are used, there will be at least a / n −2n log n [BBL09]. There are many interesting examples of Strongly 1 poly( ) probability of failure. Meanwhile, log log n Rayleigh distributions [PP14]: The example mentioned ear- independent samples in a union of cliques can be shown lier of heads of independent coin flips conditional on the whp. to fail to give a cut sparsifier. This can be observed total number of heads in the outcome; symmetric exclusion directly from the degree distribution of a single vertex in processes; determinental point processes and determinental the complete graph. For a variant of [SS11] sampling based measures on a boolean lattice. An example of particular on flipping a single coin for each edge to decide whether to interest to us is the edges of uniform or weighted random keep it or not, it can also be shown that when the expected −2n log n spanning trees, which form a Strongly Rayleigh distribution. number of edges is log log n , whp. the procedure fails We prove a Matrix Chernoff bound for the case of k- to give a cut sparsifier. For arbitrary sparsification schemes, homogeneous Strongly Rayleigh distributions. Our bound is bounds in [BSS12] show that Θ(−2n) edges are necessary slightly weaker than the bound for independent variables, but and sufficient to give an -spectral sparsifier. we can show that it is tight in some regimes. We use our The marginal probability of an edge being present in a bound to show new concentration results related to random random spanning tree is exactly the leverage score of the spanning trees of graphs. An open question is to find other edge. This seems to suggest that combining −2 poly log n interesting applications of our concentration result, e.g. by spanning trees might give a spectral sparsifier, but the lack of analyzing concentration for matrices generated by exclusion independence between the sampled edges means the process processes. cannot be analyzed using existing techniques. Observing Random spanning trees are one among the most well- this, Baston, Spielman, Srivastava, Teng [BSST13] in their studied probabilistic objects in graph theory, going back to excellent 2013 survey on sparsification noted that “it re- the work of Kirchoff [Kir47] in 1847, who gave formula mains to be seen if the union of a small number of random relating the number of spanning trees in a graph to the spanning trees can produce a spectral sparsifier.” We answer determinant of the Laplacian of the graph. this question in the affirmative. In particular, we show that Algorithms for sampling of random spanning trees have adding together O(−2 log2 n) spanning trees with edges been studied extensively, [Gue83], [Bro89], [Ald90], scaled proportional to inverse leverage scores in the original [Kul90], [Wil96], [CMN96], [KM09], [MST15], [HX16], graph leads to a -spectral sparsifier. This matches the bound [DKP+17], [DPPR17], [Sch18], and a random spanning tree obtained for cut sparsifiers in [FHHP11]. Our result also im- can now be sampled in almost linear time [Sch18]. plies their earlier bound since a spectral sparsifier is always In theoretical computer science, random spanning trees a cut sparsifier with the same approximation quality. Before have found a number of applications, most notably in our result, only a trivial bound on the number of spanning breakthrough results on approximating the traveling sales- trees required to build a spectral sparsifier was known. person problem with symmetric [GSS11] and asymmetric In particular standard matrix concentration arguments like costs [AGM+10]. Goyal et al. [GRV09] demonstrated that those in [SS11] prove that O(−2n log n) spanning trees adding just two random spanning trees sampled from a suffice.

Load more