A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a Few Random Spanning Trees

Total Page:16

File Type:pdf, Size:1020Kb

A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a Few Random Spanning Trees 2018 IEEE 59th Annual Symposium on Foundations of Computer Science A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a few Random Spanning Trees Rasmus Kyng Zhao Song School of Engineering and Applied Sciences School of Engineering and Applied Sciences Harvard University Harvard University Cambridge, MA, USA Cambridge, MA, USA [email protected] [email protected] Abstract— usually being measured by spectral norm. Modern quantita- Strongly Rayleigh distributions are a class of negatively tive bounds of the form often used in theoretical computer dependent distributions of binary-valued random variables science were derived by for example by Rudelson [Rud99], [Borcea, Brändén, Liggett JAMS 09]. Recently, these distribu- tions have played a crucial role in the analysis of algorithms for while Ahlswede and Winter [AW02] established a useful fundamental graph problems, e.g. Traveling Salesman Problem matrix-version of the Laplace transform that plays a central [Gharan, Saberi, Singh FOCS 11]. We prove a new matrix role in scalar concentration results such as those of Bern- Chernoff bound for Strongly Rayleigh distributions. stein. [AW02] combined this with the Golden-Thompson As an immediate application, we show that adding together − trace inequality to prove matrix concentration results. Tropp the Laplacians of 2 log2 n random spanning trees gives an (1 ± ) spectral sparsifiers of graph Laplacians with high refined this approach, and by replacing the use of Golden- probability. Thus, we positively answer an open question posted Thompson with deep a theorem on concavity of certain trace in [Baston, Spielman, Srivastava, Teng JACM 13]. Our number functions due to Lieb, Tropp was able to recover strong of spanning trees for spectral sparsifier matches the number versions of a wide range of scalar concentration results, of spanning trees required to obtain a cut sparsifier in [Fung, including matrix Chernoff bounds, Azuma and Freedman’s Hariharan, Harvey, Panigraphi STOC 11]. The previous best result was by naively applying a classical matrix Chernoff inequalities for matrix martingales [Tro12]. − bound which requires 2n log n spanning trees. For the tree Matrix concentration results have had an enormous range averaging procedure to agree with the original graph Laplacian of applications in computer science, and are ubiquitous in expectation, each edge of the tree should be reweighted by throughout spectral graph theory [ST04], [SS11], [CKP+17], the inverse of the edge leverage score in the original graph. sketching [Coh16], approximation algorithms [HSSS16], We also show that when using this reweighting of the edges, + the Laplacian of single random tree is bounded above in the and deep learning [ZSJ 17], [ZSD17]. Most applications are PSD order by the original graph Laplacian times a factor log n based on results for independent random matrices, but more with high probability, i.e. LT O(log n)LG. flexible bounds, such as Tropp’s Matrix Freedman Inquality We show a lower bound that almost matches our last result, [Tro11a], have been used to greatly simplify algorithms, namely that in some graphs, with high probability, the random e.g. for solving Laplacian linear equations [KS16] and spanning tree is not bounded above in the spectral order by log n for semi-streaming graph sparsification [AG09], [KPPS17]. log log n times the original graph Laplacian. − We also show a lower bound that in 2 log n spanning trees Matrix concentration results are also closely related to other are necessary to get a (1 ± ) spectral sparsifier. popular tools sampling tools, such as Karger’s techniques Keywords-Matrix Chernoff bounds; random spanning trees; for generating sparse graphs that approximately preserve the Strongly Rayleigh; Spectral sparsifier; cuts of denser graphs [BK96]. Negative dependence of random variables is an appealing I. INTRODUCTION property that intuition suggests should help with concen- The idea of concentration of sums of random variables tration of measure. Notions of negative dependence can be dates back to Central Limit Theorems, and hence de Moivre formalized in many ways. Roughly speaking, these notions and Laplace [Tij], while modern concentration bounds for characterize distributions where where some event occurring sums of random variables were perhaps first established by ensures that other events of interest become less likely. A Bernstein [Ber24], and a popular variant now known as simple example is the distribution of a sequence of coin flips, Chernoff bounds was introduced by Rubin and published conditioned on the total number of heads in the outcome. by Chernoff [Che52]. In this distribution, conditioning on some coin coming out Concentration of measure for matrix-valued random vari- heads makes all other coins less likely to come out heads. ables is the phenomenon that many matrix valued distribu- Unfortunately, negative dependence phenomena are not as tions are to close their mean with high probability, closeness robust as positive association which can be established from 2575-8454/18/$31.00 ©2018 IEEE 373 DOI 10.1109/FOCS.2018.00043 local conditions using the powerful FKG theorem [FKG71]. The techniques of Fung et al. unfortunately do not extend Strongly Rayleigh distributions were introduced recently to proving spectral sparsifiers. by Borcea, Brändén, and Liggett [BBL09] as a class of Spectral graph sparsifiers were introduced by Spielman negatively dependent distributions of binary-valued random and Teng [ST04], who for any graph G showed how to variables with many useful properties. Strongly Rayleigh construct a another graph H with −2n poly log n edges s.t. distributions satisfy useful negative dependence properties, (1 − )LG LH (1 + )LG, which we refer to as an - and retain these properties under natural conditioning opera- spectral sparsifier. The construction was refined by Spielman tions. Strongly Rayleigh distributions also satisfy a powerful and Srivastava [SS11], who suggested sampling edges inde- stability property under conditioning known as Stochastic pendently1 with probability proportional to their leverage Covering [PP14], which is useful for analyzing them through scores, and brought the number of required samples down martingale techniques. A measure on {0, 1}n is said to be to −2n log n. This analysis is tight in the sense that if fewer Strongly Rayleigh if its generating polynomial is real stable than o(−2n log n) samples are used, there will be at least a / n −2n log n [BBL09]. There are many interesting examples of Strongly 1 poly( ) probability of failure. Meanwhile, log log n Rayleigh distributions [PP14]: The example mentioned ear- independent samples in a union of cliques can be shown lier of heads of independent coin flips conditional on the whp. to fail to give a cut sparsifier. This can be observed total number of heads in the outcome; symmetric exclusion directly from the degree distribution of a single vertex in processes; determinental point processes and determinental the complete graph. For a variant of [SS11] sampling based measures on a boolean lattice. An example of particular on flipping a single coin for each edge to decide whether to interest to us is the edges of uniform or weighted random keep it or not, it can also be shown that when the expected −2n log n spanning trees, which form a Strongly Rayleigh distribution. number of edges is log log n , whp. the procedure fails We prove a Matrix Chernoff bound for the case of k- to give a cut sparsifier. For arbitrary sparsification schemes, homogeneous Strongly Rayleigh distributions. Our bound is bounds in [BSS12] show that Θ(−2n) edges are necessary slightly weaker than the bound for independent variables, but and sufficient to give an -spectral sparsifier. we can show that it is tight in some regimes. We use our The marginal probability of an edge being present in a bound to show new concentration results related to random random spanning tree is exactly the leverage score of the spanning trees of graphs. An open question is to find other edge. This seems to suggest that combining −2 poly log n interesting applications of our concentration result, e.g. by spanning trees might give a spectral sparsifier, but the lack of analyzing concentration for matrices generated by exclusion independence between the sampled edges means the process processes. cannot be analyzed using existing techniques. Observing Random spanning trees are one among the most well- this, Baston, Spielman, Srivastava, Teng [BSST13] in their studied probabilistic objects in graph theory, going back to excellent 2013 survey on sparsification noted that “it re- the work of Kirchoff [Kir47] in 1847, who gave formula mains to be seen if the union of a small number of random relating the number of spanning trees in a graph to the spanning trees can produce a spectral sparsifier.” We answer determinant of the Laplacian of the graph. this question in the affirmative. In particular, we show that Algorithms for sampling of random spanning trees have adding together O(−2 log2 n) spanning trees with edges been studied extensively, [Gue83], [Bro89], [Ald90], scaled proportional to inverse leverage scores in the original [Kul90], [Wil96], [CMN96], [KM09], [MST15], [HX16], graph leads to a -spectral sparsifier. This matches the bound [DKP+17], [DPPR17], [Sch18], and a random spanning tree obtained for cut sparsifiers in [FHHP11]. Our result also im- can now be sampled in almost linear time [Sch18]. plies their earlier bound since a spectral sparsifier is always In theoretical computer science, random spanning trees a cut sparsifier with the same approximation quality. Before have found a number of applications, most notably in our result, only a trivial bound on the number of spanning breakthrough results on approximating the traveling sales- trees required to build a spectral sparsifier was known. person problem with symmetric [GSS11] and asymmetric In particular standard matrix concentration arguments like costs [AGM+10]. Goyal et al. [GRV09] demonstrated that those in [SS11] prove that O(−2n log n) spanning trees adding just two random spanning trees sampled from a suffice.
Recommended publications
  • Probability Theory - Part 3 Martingales
    PROBABILITY THEORY - PART 3 MARTINGALES MANJUNATH KRISHNAPUR CONTENTS Progress of lectures3 1. Conditional expectation4 2. Conditional probability7 3. Relationship between conditional probability and conditional expectation 12 4. Properties of conditional expectation 13 5. Cautionary tales on conditional probability 15 6. Martingales 17 7. Stopping times 22 8. Optional stopping or sampling 26 9. Applications of the optional stopping theorem 29 10. Random walks on graphs 31 11. Maximal inequality 36 12. Doob’s up-crossing inequality 38 13. Convergence theorem for super-martingales 39 14. Convergence theorem for martingales 40 15. Another approach to martingale convergence theorems 42 16. Reverse martingales 44 17. Application: Levy’s´ forward and backward laws 45 18. Kolmogorov’s zero-one law 46 19. Strong law of large numbers 47 20. Critical branching process 47 21. Polya’s´ urn scheme 48 22. Liouville’s theorem 50 23. Hewitt-Savage zero-one law 51 24. Exchangeable random variables 53 25. Absolute continuity and singularity of product measures 56 26. The Haar basis and almost sure convergence 59 1 27. Probability measures on metric spaces 59 2 PROGRESS OF LECTURES • Jan 1: Introduction • Jan 3: Conditional expectation, definition and existence • Jan 6: Conditional expectation, properties • Jan 8: Conditional probability and conditional distribution • Jan 10: —Lecture cancelled— • Jan 13: More on regular conditional distributions. Disintegration. • Jan 15: —Sankranthi— • Jan 17: Filtrations. Submartingales and martingales. Examples. • Jan 20: New martingales out of old. Stopped martingale. • Jan 22: Stopping time, stopped sigma-algebra, optional sampling • Jan 24: Gambler’s ruin. Waiting times for patterns in coin tossing. • Jan 27: Discrete Dirichlet problem.
    [Show full text]
  • Discrete Time Stochastic Processes
    Discrete Time Stochastic Processes Joseph C. Watkins May 5, 2007 Contents 1 Basic Concepts for Stochastic Processes 3 1.1 Conditional Expectation . 3 1.2 Stochastic Processes, Filtrations and Stopping Times . 10 1.3 Coupling . 12 2 Random Walk 16 2.1 Recurrence and Transience . 16 2.2 The Role of Dimension . 20 2.3 Simple Random Walks . 22 3 Renewal Sequences 24 3.1 Waiting Time Distributions and Potential Sequences . 26 3.2 Renewal Theorem . 31 3.3 Applications to Random Walks . 33 4 Martingales 35 4.1 Examples . 36 4.2 Doob Decomposition . 38 4.3 Optional Sampling Theorem . 39 4.4 Inequalities and Convergence . 45 4.5 Backward Martingales . 54 5 Markov Chains 57 5.1 Definition and Basic Properties . 57 5.2 Examples . 59 5.3 Extensions of the Markov Property . 62 5.4 Classification of States . 65 5.5 Recurrent Chains . 71 5.5.1 Stationary Measures . 71 5.5.2 Asymptotic Behavior . 77 5.5.3 Rates of Convergence to the Stationary Distribution . 82 5.6 Transient Chains . 85 5.6.1 Asymptotic Behavior . 85 1 CONTENTS 2 5.6.2 ρC Invariant Measures . 90 6 Stationary Processes 97 6.1 Definitions and Examples . 97 6.2 Birkhoff’s Ergodic Theorem . 99 6.3 Ergodicity and Mixing . 101 6.4 Entropy . 103 1 BASIC CONCEPTS FOR STOCHASTIC PROCESSES 3 1 Basic Concepts for Stochastic Processes In this section, we will introduce three of the most versatile tools in the study of random processes - conditional expectation with respect to a σ-algebra, stopping times with respect to a filtration of σ-algebras, and the coupling of two stochastic processes.
    [Show full text]
  • Martingale Convergence and Azuma's Inequality
    Lecturer: Prof. Yufei Zhao Notes by: Andrew Lin 8 Martingale convergence and Azuma’s inequality 8.1 Setup: what is a martingale? Definition 8.1 A martingale is a sequence of random variables Z0;Z1; ··· ; such that for every n, EjZnj < 1 (this is a technical assumption), and E[Zn+1jZ0; ··· ;Zn] = Zn: This comes up in a lot of different ways: Example 8.2 Consider a random walk X1;X2; ··· of independent steps, each with mean 0. Then we can define the martingale X Zn = Xi ; i≤n which fits the definition because we always expect our average position after step n + 1 to be the same as where we just were after step n. Example 8.3 (Betting strategy) Let’s say we go to a casino, and all bets are “fair” (have expectation 0). For example, we may bet on fair odds against a coin flip. Our strategy can adapt over time based on the outcomes: let Zn be our balance after n rounds. This is still a martingale! This is more general than just a random walk, because now we don’t need the steps to be independent. Example 8.4 Let’s say my goal is to win 1 dollar. I adapt the following strategy: • Bet a dollar; if I win, stop. • Otherwise, double the wager and repeat. This is a martingale, because all betting strategies are martingales. With probability 1, we must always win at some point, so we end up with 1 dollar at the end! This sounds like free money, but we have a finite amount of money (so this would never occur in real life).
    [Show full text]
  • Stochastic Calculus
    Stochastic Calculus Xi Geng Contents 1 Review of probability theory 3 1.1 Conditional expectations . 3 1.2 Uniform integrability . 5 1.3 The Borel-Cantelli lemma . 6 1.4 The law of large numbers and the central limit theorem . 7 1.5 Weak convergence of probability measures . 7 2 Generalities on continuous time stochastic processes 13 2.1 Basic definitions . 13 2.2 Construction of stochastic processes: Kolmogorov’s extension theorem . 15 2.3 Kolmogorov’s continuity theorem . 18 2.4 Filtrations and stopping times . 20 3 Continuous time martingales 27 3.1 Basic properties and the martingale transform: discrete stochastic inte- gration . 27 3.2 The martingale convergence theorems . 28 3.3 Doob’s optional sampling theorems . 34 3.4 Doob’s martingale inequalities . 37 3.5 The continuous time analogue . 38 3.6 The Doob-Meyer decomposition . 42 4 Brownian motion 51 4.1 Basic properties . 51 4.2 The strong Markov property and the reflection principle . 52 4.3 The Skorokhod embedding theorem and the Donsker invariance principle 55 4.4 Passage time distributions . 62 4.5 Sample path properties: an overview . 67 5 Stochastic integration 71 5.1 L2-bounded martingales and the bracket process . 71 5.2 Stochastic integrals . 79 5.3 Itô’s formula . 88 1 5.4 The Burkholder-Davis-Gundy Inequalities . 91 5.5 Lévy’s characterization of Brownian motion . 94 5.6 Continuous local martingales as time-changed Brownian motions . 95 5.7 Continuous local martingales as Itô’s integrals . 102 5.8 The Cameron-Martin-Girsanov transformation . 108 5.9 Local times for continuous semimartingales .
    [Show full text]