3 a Few More Good Inequalities, Martingale Variety 1 3.1 From
Total Page:16
File Type:pdf, Size:1020Kb
3 A few more good inequalities, martingale variety1 3.1 From independence to martingales...............1 3.2 Hoeffding’s inequality for martingales.............4 3.3 Bennett's inequality for martingales..............8 3.4 *Concentration of random polynomials............. 10 3.5 *Proof of the Kim-Vu inequality................ 14 3.6 Problems............................. 18 3.7 Notes............................... 18 Printed: 8 October 2015 version: 8Oct2015 Mini-empirical printed: 8 October 2015 c David Pollard x3.1 From independence to martingales 1 Chapter 3 A few more good inequalities, martingale variety Section 3.1 introduces the method for bounding tail probabilities using mo- ment generating functions. Section 3.2 discusses the Hoeffding inequality, both for sums of independent bounded random variables and for martingales with bounded increments. Section 3.3 discusses the Bennett inequality, both for sums of indepen- dent random variables that are bounded above by a constant and their martingale analogs. *Section3.4 presents an extended application of a martingale version of the Bennett inequality to derive a version of the Kim-Vu inequality for poly- nomials in independent, bounded random variables. 3.1 From independence to martingales BasicMG::S:intro Throughout this Chapter f(Si; Fi): i = 1; : : : ; ng is a martingale on some probability space (Ω; F; P). That is, we have a sub-sigma fields F0 ⊆ F1 ⊆ · · · ⊆ Fn ⊆ F and integrable, Fi-measurable random variables Si for which PFi−1 Si = Si−1 almost surely. Equivalently, the martingale differ- ences ξi := Si − Si−1 are integrable, Fi-measurable, and PFi−1 ξi = 0 almost surely, for i = 1; : : : ; n. In that case Si = S0 + ξ1 + ··· + ξi = Si−1 + ξi. For simplicity I assume F0 is trivial, so that S0 = µ = PSn. Also, I usually omit the \almost sure" qualifiers, unless there is some good reason to Draft: 8Oct2015 c David Pollard x3.1 From independence to martingales 2 remind you that the conditional expectations are unique only up to almost sure equivalence. And to avoid a plethora of subscripts, I abbreviate the conditional expectation operator PFi−1 to Pi−1. The inequalities from Sections 2.5 and 2.6 have martingale analogs, which have proved particularly useful for the development of concentration inequalities. The analog of the Hoeffding inequality is often attributed to Azuma(1967), even though Hoeffding(1963, pages 17{18) had already noted that \The inequalities of this section can be strengthened in the following way. Let Sm = X1 + ··· + Xm for m = 1; 2; : : : ; n. Furthermore, the inequalities of Theorems 1 and 2 remain true if the assumption that X1;X2;:::;Xn. are independent 0 is replaced by the weaker assumption that the sequence Sm = Sm − ESm, m = 1; 2; : : : ; n, is a martingale, that is, . Indeed, Doob's inequality (2.17) is true under this assumption. On the other hand, (2.18) implies that the conditional mean of Xm for 0 Sm−1 fixed is equal to its unconditional mean. A slight modifica- tion of the proofs of Theorems 1 and 2 yields the stated result." In principle, the martingale results can be proved by conditional analogs of the moment generating function technique described in Section 2.2, with just a few precautions to avoid problems with negligible sets. In that Sec- tion I started with a random variable X that lives on some probability space (Ω; F; P). By means of arguments involving differentiation inside ex- pectations I derived a Taylor expansion for the cumulant generating func- tion, θX 1 2 \E@ L.Taylor <1> L(θ) = log Pe = θPX + 2 θ varθ∗ (X); where θ∗ lies between 0 and θ and the variance is calculated under the ∗ ∗ probability measure Pθ∗ that has density exp (θ X − L(θ )) with respect to P. The obvious analog for expectations conditional on some sub-sigma- field G of F is θX 1 2 \E@ L.Taylor.condit <2> log PGe = θPGX + 2 θ varG,θ∗ (X) almost surely.: θX Here the conditional expectation PGe is defined in the usual Kolmogorov way as the almostly surely unique G-measurable random variable M!(θ) for Draft: 8Oct2015 c David Pollard x3.1 From independence to martingales 3 which θX P g(!)e = P (g(!)M!(θ)) at least for all nonnegative, G-measurable random variables g. The random variable M!(θ) could be changed on some G-measurable negligible set Nθ without affecting the defining equality. Unfortunately, such negligible am- biguities can become a major obstacle if we want the map θ 7! M!(θ) to be differentiable, or even continuous. Regular conditional distributions eliminate the obstacle, by consolidating all the bookkeeping with negligible sets into one step. Recall that the distribution of a real-valued random variable X is a prob- ability measure P on B(R) for which Pf(X) = P f, at least for all bounded, B(R)-measurable real functions f.A regular conditional distribution for X given a sub-sigma-field G is a Markov kernel, a set fP! : ! 2 Ωg of probability measures on B(R) for which P!f is a version of the conditional expectation PGf(X) whenever it is well defined. That is, ! 7! P!f is G- measurable and Pg(!)f(X(!)) = P (g(!)P!f) for a suitably large collection of B(R)-measurable functions f and G-measurable functions g. Such regular conditional distributions always exist (Breiman, 1968, Section 4.3). Remark. As noted by Breiman(1968, page 80), regular conditional distributions also justify the familiar idea that for the purpose of calculating PG conditional expectations we may treat G-measurable functions as constants. For example, if f is a bounded B(R2)nB(R)- measurable real function on R2 and G is a G-measurable random x variable then PGf(X; G) = P! f(x; G(!)) almost surely. You might find it instructive to prove this assertion using a generating class argument, starting from the fact that it is valid when f factorizes: f(x; y) = f1(x)f2(y). θx If we choose P!e as the version of the conditional moment generating function M!(θ) then the arguments from Section 2.2 work as before: on the interior of the interval J! = fθ 2 R : M!(θ) < 1g we have L!(θ) := log M!(θ) 0 θx L!(θ) = P! xe =M!(θ) = P!,θx 00 2 L!(θ) = var!,θ(x) = P!,θ(x − µ!,θ) where µ!,θ := P!,θx: As before, the probability measure P!,θ is defined on B(R) by its den- xθ sity, dP!,θ=dP! = e =M!(θ). If 0 and θ are interior points of J! then Draft: 8Oct2015 c David Pollard x3.2 Hoeffding's inequality for martingales 4 equality <2> takes the cleaner form 1 2 L!(θ) = θP!x + 2 θ var!,θ∗ (x) for all !, with θ∗, which might depend on !, lying between 0 and θ. Now suppose there exist functions a! and b! for which P![a!; b!] = 1. As in Section 2.5 we have θ(x−P!x) 1 2 2 P!e ≤ exp 8 θ (b! − a!) so that 2 2 θ(X−PGX) θ R!=8 \E@ condit.Hoeff.mgf <3> PGe ≤ e if R! ≥ b! − a!. a:s: Remark. Strangely enough, the first inequality does not require any sort of measurability assumption about a! and b!, because ! is being held fixed. However, we usually need the range R! to be G-measurable when integrating out <3>. 2 Similarly, if P!(−∞; b!] = 1 and v! ≥ P!x then θ(X−PGX) 1 2 \E@ condit.Bennett.mgf <4> PGe ≤ exp 2 θ v!∆(θb!) for θ ≥ 0; a:s: where, as before, ∆ denotes the convex increasing function on R for which 1 2 x 2 x ∆(x) = Φ1(x) := e − 1 − x: 3.2 Hoeffding’s inequality for martingales BasicMG::S:HoeffdingMG Consider the martingale f(Si; Fi): i = 1; : : : ; ng described at the start of Section 3.1. Suppose the conditional distribution of the increment ξi given Fi−1 con- centrates on an interval whose length is bounded above by an Fi−1-measurable function Ri(!). Then inequality <3> gives θξi 1 2 2 \E@ mda.mgf <5> Pi−1e ≤ exp 8 θ Ri : If the Ri's are actually constants then θSi θSi−1 θξi 1 2 2 θSi−1 Pe = P e Pi−1e ≤ exp 8 θ Ri Pe ; Draft: 8Oct2015 c David Pollard x3.2 Hoeffding's inequality for martingales 5 which, by repeated substitution, gives 2 θSn θS0 Y 1 2 2 θPSn+θ W=8 P 2 Pe ≤ Pe exp 8 θ Ri = e where W = 1≤i≤n Ri : 1≤i≤n For each nonnegative x, 2 2 \E@ Hoeff.promised <6> PfSn ≥ x + PSng ≤ inf exp(−xθ + W θ =8) = exp −2x =W : θ≥0 We have the same bound as for a sum of independent increments, as Ho- effding promised. If the Ri's are truly random, a slightly more subtle argument leads to the inequality 2 X 2 \E@ range.squares <7> fSn ≥ x + Sng ≤ exp(−2 =W ) + f R > W g P P P i≤n i for each constant W . Compare with McDiarmid, 1998, Theorem 3.7. P 2 To prove <7> define Vi := k≤i Rk, which is Fi−1 measurable. For each i, h i 2 2 θξi P exp θSi − θ Vi=8 = P exp θSi−1 − θ Vi=8) Pi−1e 2 ≤ P exp θSi−1 − θ Vi−1=8 by <5>. Repeated substitution now gives 2 θPSn P exp θSn − θ Vn=8) ≤ P exp (θS0) = e : The strange Vn contribution is not a problem on the set where Vn ≤ W : 2 2 θSn θSn+θ (W −Vn)=8 θPSn+θ W=8 Pe fVn ≤ W g ≤ Pe ≤ e so that −θ(x+PSn) θSn PfSn ≥ x + PSng ≤ PfVn > W g + infθ>0 e Pe fVn ≤ W g: And so on, until inequality <7> emerges.