The Martingale Approach for Concentration and Applications in Information Theory, Communications and Coding

RAGINSKY AND SASON: DRAFT OF THE FIRST CHAPTER OF A TUTORIAL, LAST UPDATED: DECEMBER 10, 2012 1 The Martingale Approach for Concentration and Applications in Information Theory, Communications and Coding Abstract This chapter introduces some concentration inequalities for discrete-time martingales with bounded increments, and it exemplifies some of their potential applications in information theory and related topics. The first part of this chapter introduces some concentration inequalities for martingales that include the Azuma-Hoeffding, Bennett, Freedman and McDiarmid inequalities. These inequalities are also specialized for sums of independent and bounded random variables that include the inequalities by Bernstein, Bennett, Hoeffding, and Kearns & Saul. An improvement of the martingale inequalities for some sub- classes of martingales (e.g., the conditionally symmetric martingales) is discussed in detail, and some new refined inequalities are derived. The first part of this chapter also considers a geometric interpretation of some of these inequalities, providing an insight on the inter-connections between them. The second part of this chapter exemplifies the potential applications of the considered martingale inequalities in the context of information theory and related topics. The considered applications include binary hypothesis testing, concentration for codes defined on graphs, concentration for OFDM signals, and a use of some martingale inequalities for the derivation of achievable rates under ML decoding and lower bounds on the error exponents for random coding over some linear or non-linear communication channels. Index Terms Concentration of measures, error exponents, Fisher information, information divergence, large deviations, martingales. I. INTRODUCTION Inequalities providing upper bounds on probabilities of the type P( X x t) (or P(X x t) for a random variable (RV) X, where x denotes the expectation or median| of −X,| have ≥ been among− the≥ main tools of probability theory. These inequalities are known as concentration inequalities, and they have been subject to interesting developments during the last two decades. Very roughly speaking, the concentration of measure phenomenon can be stated in the following simple way: “A random variable that depends in a smooth way on many independent random variables (but not too much on any of them) is essentially constant” [77]. The exact meaning of such a statement clearly needs to be clarified rigorously, but it will often mean that such a random variable X concentrates around x in a way that the probability of the event X x > t decays exponentially {| − | } in t (for t 0). The foundations in concentration of measures have been introduced, e.g., in [41], [44], [49], [53] and [76]. ≥ In recent years, concentration inequalities have been studied and used as a powerful tool in various areas such as convex geometry, functional analysis, statistical physics, dynamical systems, probability (random matrices, Markov processes, random graphs, percolation), information theory and statistics, learning theory and randomized algorithms. Several techniques have been developed so far to prove concentration of measures. These include: Talagrand’s inequalities for product measures (see, e.g., [53, Chapter 4], [74, Chapter 6], [76] and [77] with • some information-theoretic applications in [39] and [40]). The entropy method and logarithmic-Sobolev inequalities (see, e.g., [2], [18, Chapter 14], [41, Chapter 5], • [44]–[51] with information-theoretic aspects in, e.g., [34], [35], [37] and [38]). This methodology and its remarkable information-theoretic links will be considered in the next chapter. Transportation-cost inequalities that originated from information theory (see, e.g., [18, Chapters 12, 13], [27], • [41, Chapter 6], [46], [47], [48]). This methodology and its information-theoretic aspects will be considered in the next chapter, with a discussion on the relation of transportation-cost inequalities to the entropy method and logarithmic-Sobolev inequalities. Stein’s method is used to prove concentration inequalities, a.k.a. concentration inequalities with exchangeable • pairs (see, e.g., [10], [11] and [12]). This relatively recent framework is not addressed in this work. The martingale approach is useful for the derivation of concentration inequalities (see, e.g., [1, Chapter 7], • [13], [14], [53] with information-theoretic aspects, e.g., in [20], [32], [43], [62], [63], [67]–[70], [83] and [84]). This chapter mainly considers this last methodology, focusing on concentration inequalities that are derived for discrete-time martingales with bounded jumps. 2 RAGINSKYANDSASON:DRAFTOFTHEFIRSTCHAPTEROFATUTORIAL, LAST UPDATED: DECEMBER 10, 2012 Let f : Rn R be a function that is characterized by bounded differences whenever the n-dimensional vectors differ in only→ one coordinate. A common method for proving concentration of such a function of n independent RVs, around the expected value E[f], is called McDiarmid’s inequality or the ’independent bounded-differences inequality’ (see [53, Theorem 3.1]). This inequality was proved (with some possible extensions) via the martingale approach (see [53, Section 3.5]). Although the proof of this inequality has some similarity to the proof of the Azuma-Hoeffding inequality, the former inequality is stated under a condition which provides an improvement by a factor of 4 in the exponent. Some of its nice applications to algorithmic discrete mathematics were exemplified in, e.g., [53, Section 3]. The Azuma-Hoeffding inequality is by now a well-known methodology that has been often used to prove concentration phenomena for discrete-time martingales whose jumps are bounded almost surely. It is due to Hoeffding [30] who proved this inequality for a sum of independent and bounded random variables, and Azuma [3] later extended it to bounded-difference martingales. It is noted that the Azuma-Hoeffding inequality for a bounded martingale-difference sequence was extended to centering sequences with bounded differences [54]; this extension provides sharper concentration results for, e.g., sequences that are related to sampling without replacement. The use of the Azuma-Hoeffding inequality was introduced to the computer science literature in [71] in order to prove concentration, around the expected value, of the chromatic number for random graphs. The chromatic number of a graph is defined to be the minimal number of colors that is required to color all the vertices of this graph so that no two vertices which are connected by an edge have the same color, and the ensemble for which concentration was demonstrated in [71] was the ensemble of random graphs with n vertices such that any ordered pair of vertices in the graph is connected by an edge with a fixed probability p for some p (0, 1). It is noted that the concentration result in [71] was established without knowing the expected value over∈ this ensemble. The migration of this bounding inequality into coding theory, especially for exploring some concentration phenomena that are related to the analysis of codes defined on graphs and iterative message-passing decoding algorithms, was initiated in [43], [62] and [73]. During the last decade, the Azuma-Hoeffding inequality has been extensively used for proving concentration of measures in coding theory (see, e.g., [63] and references therein). In general, all these concentration inequalities serve to justify theoretically the ensemble approach of codes defined on graphs. However, much stronger concentration phenomena are observed in practice. The Azuma-Hoeffding inequality was also recently used in [80] for the analysis of probability estimation in the rare-events regime where it was assumed that an observed string is drawn i.i.d. from an unknown distribution, but the alphabet size and the source distribution both scale with the block length (so the empirical distribution does not converge to the true distribution as the block length tends to infinity). In [82]–[84], the martingale approach was used to derive achievable rates and random coding error exponents for linear and non-linear additive white Gaussian noise channels (with or without memory). This chapter is structured as follows: Section II presents briefly discrete-time (sub/ super) martingales, Section III presents some basic inequalities that are widely used for proving concentration inequalities via the martingale approach. Section IV derives some refined versions of the Azuma-Hoeffding inequality, and it considers intercon- nections between these concentration inequalities. Section V introduces Freedman’s inequality with a refined version of this inequality, and these inequalities are specialized to get concentration inequalities for sums of independent and bounded random variables. Section VI considers some connections between the concentration inequalities that are introduced in Section IV to the method of types, a central limit theorem for martingales, the law of iterated logarithm, the moderate deviations principle for i.i.d. real-valued random variables, and some previously-reported concentration inequalities for discrete-parameter martingales with bounded jumps. Section VII forms the second part of this work, applying the concentration inequalities from Section IV to information theory and some related topics. This chapter is summarized briefly in Section VIII. In addition to the presentation in this chapter, the reader is also referred to very nice surveys on concentration inequalities via the

Load more