Old and New Concentration Inequalities
Total Page:16
File Type:pdf, Size:1020Kb
CHAPTER 2 Old and New Concentration Inequalities In the study of random graphs or any randomly chosen objects, the “tools of the trade” mainly concern various concentration inequalities and martingale inequalities. Suppose we wish to predict the outcome of a problem of interest. One reason- able guess is the expected value of the object. However, how can we tell how good the expected value is to the actual outcome of the event? It can be very useful if such a prediction can be accompanied by a guarantee of its accuracy (within a certain error estimate, for example). This is exactly the role that the concentration inequalities play. In fact, analysis can easily go astray without the rigorous control coming from the concentration inequalities. In our study of random power law graphs, the usual concentration inequalities are simply not enough. The reasons are multi-fold: Due to uneven degree distri- bution, the error bound of those very large degrees offset the delicate analysis in the sparse part of the graph. Furthermore, our graph is dynamically evolving and therefore the probability space is changing at each tick of the clock. The problems arising in the analysis of random power law graphs provide impetus for improving our technical tools. Indeed, in the course of our study of general random graphs, we need to use several strengthened versions of concentration inequalities and martingale inequali- ties. They are interesting in their own right and are useful for many other problems as well. In the next several sections, we state and prove a number of variations and generalizations of concentration inequalities and martingale inequalities. Many of these will be used in later chapters. 2.1. The binomial distribution and its asymptotic behavior Bernoulli trials, named after James Bernoulli, can be thought of as a sequence of coin flips. For some fixed value p,where0≤p≤1, the outcome of the coin tossing process has probability p of getting a “head”. Let Sn denote the number of heads after n tosses. We can write Sn as a sum of independent random variables Xi as follows: Sn = X1 + X2 + ···+Xn 23 24 2. OLD AND NEW CONCENTRATION INEQUALITIES where, for each i, the random variable Xi satisfies Pr(Xi =1) = p, (2.1) Pr(Xi =0) = 1−p. A classical question is to determine the distribution of Sn. It is not too difficult to see that Sn has the binomial distribution B(n, p): n Pr(S = k)= pk(1 − p)n−k, for k =0,1,2,...,n. n k The expectation and variance of B(n, p) are E(Sn)=np, Var(Sn )=np(1 − p). To better understand the asymptotic behavior of the binomial distribution, we compare it with the normal distribution N(a, σ), whose density function is given by 2 1 −(x−α) f (x)=√ e 2σ2 , −∞ <x<∞ 2π where α denotes the expectation and σ2 is the variance. The case N(0, 1) is called the standard normal distribution whose density func- tion is given by 1 2 f(x)=√ e−x /2, −∞ <x<∞. 2π 0.008 0.4 0.007 0.35 0.006 0.3 0.005 0.25 0.004 0.2 Probability 0.003 0.15 0.002 Probability density 0.1 0.001 0.05 0 0 4600 4800 5000 5200 5400 -10 -5 0 5 10 value value Figure 1. The Bi- Figure 2. The Stan- nomial distribution dard normal distribu- B(10000, 0.5). tion N(0, 1). When p is a constant, the limit of the binomial distribution, after scaling, is the standard normal distribution and can be viewed as a special case of the Central Limit Theorem, sometimes called the DeMoivre-Laplace Limit Theorem [53]. 2.1. THE BINOMIAL DISTRIBUTION AND ITS ASYMPTOTIC BEHAVIOR 25 Theorem 2.1. The binomial distribution B(n, p) for Sn, as defined in (2.1), satisfies, for two constants a and b, Z b 1 2 lim Pr(aσ < S − np < bσ)= √ e−x /2dx →∞ n n a 2π p where σ = np(1 − p),providednp(1 − p) →∞as n →∞. Proof. We use Stirling’s formula for n!(see[70]). √ n n!=(1+o(1)) 2πn( )n e √ n or, equivalently, n! ≈ 2πn( )n. e For any constant a and b,wehave − Pr(aσ < Sn np < bσ) X n = pk(1 − p)n−k − k aσ<k np<bσ r X 1 n nn ≈ √ pk(1 − p)n−k k(n − k) kk(n − k)n−k aσ<k−np<bσ 2π X 1 npk+1/2n(1 − p)n−k+1/2 = p − k n − k aσ<k−np<bσ 2πnp(1 p) X 1 k−np−k−1/2 k − np −n+k−1/2 = √ 1+ 1 − . np n(1 − p) aσ<k−np<bσ 2πσ To approximate the above sum, we consider the following slightly simpler expres- sion. Here, to estimate the lower order term, we use the fact that k = np + O(σ) 2 3 and 1 + x = eln(1+x) = ex−x +O(x ),forx=o(1). To proceed, we have Pr(aσ < Sn − np < bσ) X 1 k−np −k k − np −n+k ≈ √ 1+ 1 − np n(1 − p) aσ<k−np<bσ 2πσ X − − − − 2 − − 2 1 − k(k np) + (n k)(k np) + k(k np) + (n k)(k np) +O( 1 ) ≈ √ e np n(1−p) n2p2 n2(1−p)2 σ aσ<k−np<bσ 2πσ X − 1 − 1 ( k np )2+O( 1 ) = √ e 2 σ σ aσ<k−np<bσ 2πσ X − 1 − 1 ( k np )2 ≈ √ e 2 σ . aσ<k−np<bσ 2πσ k−np − Now, we set x = xk = σ , and the increment ‘dx’= xk xk−1 =1/σ.Notethat a<x1 <x2 <··· <bforms a 1/σ-net for the interval (a, b). As n approaches infinity, the limit exists. We have Z b 1 2 lim Pr(aσ < S − np < bσ)= √ e−x /2dx. →∞ n n a 2π 26 2. OLD AND NEW CONCENTRATION INEQUALITIES Thus, the limit distribution of the normalized binomial distribution is the normal distribution. When np is upper bounded (by a constant), the above theorem is no longer λ true. For example, for p = n , the limit distribution of B(n, p) is the so-called Poisson distribution P (λ): λk Pr(X = k)= e−λ, for k =0,1,2,.... k! The expectation and variance of the Poisson distribution P (λ)isgivenby E(X)=λ, and Var(X)=λ. Theorem . λ 2.2 For p = n ,whereλis a constant, the limit distribution of binomial distribution B(n, p) is the Poisson distribution P (λ). Proof. We consider n k n−k lim Pr(Sn = k) = lim p (1 − p) n→∞ n→∞ k Q − λk k 1(1 − i ) = lim i=0 n e−p(n−k) n→∞ k! λk = e−λ. k! 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 Probability Probability 0.05 0.05 0 0 0 5 10 15 20 0 5 10 15 20 value value Figure 3. The Bi- Figure 4. The Poisson nomial distribution distribution P (3). B(1000, 0.003). 1 As p decreases from Θ(1) to Θ( n ), the asymptotic behavior of the binomial distribution B(n, p) changes from the normal distribution to the Poisson distribu- tion. (Some examples are illustrated in Figures 5 and 6). Theorem 2.1 states that the asymptotic behavior of B(n, p) within the interval (np − Cσ,np+Cσ)(forany constant C) is close to the normal distribution. In some applications, we might need asymptotic estimates beyond this interval. 2.2. GENERAL CHERNOFF INEQUALITIES 27 0.05 0.14 0.04 0.12 0.1 0.03 0.08 0.02 0.06 Probability Probability 0.04 0.01 0.02 0 0 70 80 90 100 110 120 130 0 5 10 15 20 25 value value Figure 5. The Bi- Figure 6. The Bi- nomial distribution nomial distribution B(1000, 0.1). B(1000, 0.01). 2.2. General Chernoff inequalities If the random variable under consideration can be expressed as a sum of in- dependent variables, it is possible to deriveP good estimates. The binomial distri- n bution is one such example where Sn = i=1 Xi and the Xi’s are independent and identical. In this section, we consider sums of independent variables that are not necessarily identical. To control the probability of how close a sum of random variables is to the expected value, various concentration inequalities are in play. A typical version of the Chernoff inequalities, attributed to Herman Chernoff, can be stated as follows: Theorem . 2.3 [28] Let X1,...,Xn be independentP random variables such that | |≤ n 2 E(Xi)=0and Xi 1for all i.LetX= i=1 Xi and σ be the variance of Xi. Then 2 Pr(|X|≥kσ) ≤ 2e−k /4, for any 0 ≤ k ≤ 2σ. If the random variables Xi under consideration assume non-negative values, the following version of Chernoff inequalities is often useful. Theorem 2.4. [28] Let X1,...,Xn be independent random variables with Pr(X =1)=p, Pr(X =0)=1−p. i P i i i P n n We consider the sum X = i=1 Xi, with expectation E(X)= i=1 pi. Then we have 2 (Lower tail) Pr(X ≤ E(X) − λ) ≤ e−λ /2E(X), − λ2 (Upper tail) Pr(X ≥ E(X)+λ) ≤ e 2(E(X)+λ/3) . We remark that the term λ/3 appearing in the exponent of the bound for the upper tail is significant.