Concentration of Measure Inequalities in Information Theory
Total Page:16
File Type:pdf, Size:1020Kb
1 Concentration of Measure Inequalities in Information Theory, Communications and Coding MONOGRAPH Last updated: September 29, 2014. Foundations and Trends in Communications and Information Theory, Second Edition, 2014. Maxim Raginsky Department of Electrical and Computer Engineering, Coordinated Science Laboratory, arXiv:1212.4663v8 [cs.IT] 24 Feb 2015 University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. E-mail: [email protected] and Igal Sason Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa 32000, Israel. E-mail: [email protected] 2 Abstract During the last two decades, concentration inequalities have been the subject of exciting de- velopments in various areas, including convex geometry, functional analysis, statistical physics, high-dimensional statistics, pure and applied probability theory (e.g., concentration of measure phenomena in random graphs, random matrices, and percolation), information theory, theoreti- cal computer science, and learning theory. This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding. In addition to being a survey, this monograph also includes various new recent results derived by the authors. The first part of the monograph introduces classical concentration inequalities for martingales, as well as some recent refinements and extensions. The power and versatility of the martingale approach is exemplified in the context of codes defined on graphs and iterative decoding algorithms, as well as codes for wireless communication. The second part of the monograph introduces the entropy method, an information-theoretic technique for deriving concentration inequalities. The basic ingredients of the entropy method are discussed first in the context of logarithmic Sobolev inequalities, which underlie the so-called functional approach to concentration of measure, and then from a complementary information- theoretic viewpoint based on transportation-cost inequalities and probability in metric spaces. Some representative results on concentration for dependent random variables are briefly summa- rized, with emphasis on their connections to the entropy method. Finally, we discuss several applications of the entropy method to problems in communications and coding, including strong converses, empirical distributions of good channel codes, and an information-theoretic converse for concentration of measure. 3 Acknowledgments It is a pleasure to thank several individuals, who have carefully read parts of the manuscript in various stages and provided constructive comments, suggestions, and corrections. These include Ronen Eshel, Peter Harremo¨es, Eran Hof, Nicholas Kalouptsidis, Leor Kehaty, Aryeh Kontorovich, Ioannis Kontoyannis, Mokshay Madiman, Daniel Paulin, Yury Polyanskiy, Boaz Shuval, Emre Telatar, Tim van Erven, Sergio Verd´u, Yihong Wu and Kostis Xenoulis. Among these people, Leor Kehaty is gratefully acknowledged for a very detailed report on the initial draft of this manuscript, and Boaz Shuval is acknowledged for some helpful comments on the first edition. The authors are thankful to the three anonymous reviewers and the Editor in Chief, Sergio Verd´u, for very constructive and detailed suggestions, which contributed a lot to the presentation of the first edition of this manuscript. The authors accept full responsibility for any remaining omissions or errors. The work of M. Raginsky was supported in part by the U.S. National Science Foundation (NSF) under CAREER award no. CCF–1254041. The work of I. Sason was supported by the Israeli Science Foundation (ISF), grant number 12/12. The hospitality of the Bernoulli inter- faculty center at EPFL, the Swiss Federal Institute of Technology in Lausanne, during the summer of 2011 is acknowledged by I. Sason. We would like to thank the organizers of the Information Theory and Applications Workshop in San-Diego, California; our collaboration in this project was initiated during this successful workshop in Feb. 2012. Finally, we are grateful to the publishers of the Foundations and Trends (FnT) in Communications and Information Theory: Mike Casey, James Finlay and Alet Heezemans for their assistance in both the first and second editions of this monograph (dated: Oct. 2013 and Sept. 2014, respectively). Contents 1 Introduction 6 1.1 Anoverviewandabriefhistory . .. .. .. 6 1.2 Areader’sguide..................................... 9 2 Concentration Inequalities via the Martingale Approach 11 2.1 Discrete-time martingales . .. 11 2.2 Basic concentration inequalities . ... 13 2.2.1 The Chernoff bounding technique and the Hoeffding lemma . .. 14 2.2.2 The Azuma–Hoeffding inequality . 16 2.2.3 McDiarmid’s inequality . 18 2.2.4 Hoeffding’s inequality and its improved versions . 23 2.3 Refined versions of the Azuma–Hoeffding inequality . ...... 26 2.3.1 A generalization of the Azuma–Hoeffding inequality . .. 26 2.3.2 On martingales with uniformly bounded differences . .. 31 2.3.3 Inequalities for sub- and super-martingales . .... 37 2.4 Relations to classical results in probability theory . ...... 38 2.4.1 The martingale central limit theorem . 38 2.4.2 The moderate deviations principle . 39 2.4.3 Functions of discrete-time Markov chains . ... 40 2.5 Applications in information theory and coding . .... 41 2.5.1 Minimum distance of binary linear block codes . 41 2.5.2 Expansion properties of random regular bipartite graphs . ...... 42 2.5.3 Concentration of the crest factor for OFDM signals . ..... 43 2.5.4 Concentration of the cardinality of the fundamental system of cycles for LDPCcodeensembles ............................. 46 2.5.5 Concentration theorems for LDPC code ensembles over ISI channels. 48 2.5.6 On the concentration of the conditional entropy for LDPC code ensembles 52 2.6 Summary ........................................ 58 2.A Proof of Bennett’s inequality . ... 59 2.B On the moderate deviations principle in Section 2.4.2 . .... 60 2.C Proof of the properties in (2.5.9) for OFDM signals . ..... 61 2.D ProofofTheorem2.5.5................................ 62 2.E ProofofLemma2.5.1.................................. 64 3 The Entropy method, Log-Sobolev and Transportation-Cost Inequalities 66 3.1 Themainingredientsoftheentropymethod . ..... 66 3.1.1 The Chernoff bounding technique revisited . .. 68 4 CONTENTS 5 3.1.2 TheHerbstargument.............................. 69 3.1.3 Tensorization of the (relative) entropy . ... 70 3.1.4 Preview: logarithmic Sobolev inequalities . 72 3.2 The Gaussian logarithmic Sobolev inequality . ... 73 3.2.1 An information-theoretic proof of Gross’s log-Sobolev inequality...... 75 3.2.2 From Gaussian log-Sobolev inequality to Gaussian concentration inequalities 81 3.2.3 Hypercontractivity, Gaussian log-Sobolev inequality, and R´enyi divergence 82 3.3 Logarithmic Sobolev inequalities: the general scheme . ....... 85 3.3.1 Tensorization of the logarithmic Sobolev inequality . .. 87 3.3.2 Maurer’sthermodynamicmethod . 89 3.3.3 Discrete logarithmic Sobolev inequalities on the Hamming cube . .. 93 3.3.4 The method of bounded differences revisited . ... 95 3.3.5 Log-Sobolev inequalities for Poisson and compound Poisson measures ... 97 3.3.6 Bounds on the variance: Efron–Stein–Steele and Poincar´einequalities . 99 3.4 Transportation-cost inequalities . ..... 101 3.4.1 Concentration and isoperimetry . 102 3.4.2 Marton’s argument: from transportation to concentration . ......... 107 3.4.3 Gaussian concentration and T1 inequalities . 118 3.4.4 Dimension-free Gaussian concentration and T2 inequalities . 123 3.4.5 A grand unification: the HWI inequality . 127 3.5 Extension to non-product distributions . ...... 131 3.5.1 Samson’s transportation-cost inequalities for dependent random variables . 131 3.5.2 Marton’s transportation-cost inequalities for L2 Wasserstein distance . 133 3.6 Applications in information theory and related topics . ...... 135 3.6.1 Theblowing-uplemma............................. 135 3.6.2 Strong converse for the degraded broadcast channel . ......... 137 3.6.3 The empirical distribution of good channel codes with non-vanishing error probability.................................... 141 3.6.4 An information-theoretic converse for concentration of measure. 151 3.7 Summary ........................................ 156 3.A VanTreesinequality .................................. 156 3.B TheproofofTheorem3.2.3 ............................. 158 3.C Details on the Ornstein–Uhlenbeck semigroup . ..... 162 3.D LSIforBernoulliandGaussianmeasures . ..... 164 3.E Fano’s inequality for list decoding . 166 3.F Details for the derivation of (3.6.22) . .... 167 Chapter 1 Introduction 1.1 An overview and a brief history Concentration-of-measure inequalities provide bounds on the probability that a random variable X deviates from its mean, median or other typical value x by a given amount. These inequalities have been studied for several decades, with some fundamental and substantial contributions during the last two decades. Very roughly speaking, the concentration of measure phenomenon can be stated in the following simple way: “A random variable that depends in a smooth way on many independent random variables (but not too much on any of them) is essentially constant” [1]. The exact meaning of such a statement clearly needs to be clarified rigorously, but it often means that such a random variable X concentrates around x in a way that