R The Canonical Gaussian￿ on

1. Introduction The mainP goalR of this course> is to study “Gaussian measures.” TheP sim- plest example￿ of a Gaussian measure is the canonical Gaussian mea- ￿ ￿ sure on where ￿ 1 is an arbitrary integer. The measure is P R one of the main objects of study in this course, and is defined as ￿ ￿ ￿ ￿ (A) := A γ (￿) d￿ for all Borel sets A ⊆ ￿ R 2 gamma_n where −￿￿￿ /2 ￿ ￿ e γ (￿) := ￿/2 [￿ ∈ ]￿ (1.1)> (2π) 1 The function γ describes the famous “bell curve.”P In dimensions ￿ 1, ￿ 1 the curve of γ looks like a suitable “rotation” of the curve of>γ . ￿ We frequently drop the subscript ￿Pfrom when it is clear which dimension we areP in. We also choose and fix the integer ￿ 1, tacitly, consider the probability spaceR (Ω ￿ ￿ ￿ ) whereR we have dropped the ￿ subscript ￿ from [as we will some times], and set ￿ ￿ Ω:= ￿ and ￿ := ￿( )￿ R 6 6 Z Throughout we designate by Z the random vector, ￿ ￿ ￿ Z (￿) := ￿ for all ￿ ∈ and 1 ￿ ￿￿ (1.2)

3 R

￿ 4 1. The Canonical Gaussian Measure on

1 ￿ Then Z := (PZ ￿￿￿￿￿ZP ) is a random vector of ￿ i.i.d. standardR normal random variables on our probability . In particular, ￿ ￿ P (A)= {Z ∈ A} for all Borel sets A ⊆ . One of the elementary, though important, properties of the measure lem:tails ￿ is that its “tails” are vanishingly small. P R Lemma 1.1. As ￿ →∞, 2 ￿ 2+￿(1) ￿−2 −￿ /2 {￿ ∈ : ￿￿￿ >￿} = ￿/2 ￿ e ￿ 2 Γ(￿/2) S_n Proof. Since ￿ 2 2 ￿ ￿ S := ￿Z￿ = Z￿ (1.3) ￿=1 2 P ￿ R P has a χ distribution, ∞ ￿ 2 ￿ (￿−2)/2 −￿/2 ￿ 1 2 {￿ ∈> : ￿￿￿ >￿} = {S >￿ } = ￿/2 ￿ e d￿￿⇤ 2 Γ(￿/2) ￿ for all ￿ 0. Now apply l’Hˆopital’srule of calculus.

The following large-deviationsP R estimate is one of the consequences eq:LD of Lemma 1.1: 1 ￿ 1 ￿lim→∞ 2 log {￿ ∈ : ￿￿￿ >￿} = − 2 ￿ (1.4) ￿ Of course, this is a weaker statement than Lemma 1.1. But it has the advantage of being “dimension independent.” DimensionP independence properties play aP prominent role> in the analysis of Gaussian measures. ￿ Here, for example, (1.4) teaches us that the tails of Pbehave roughly 1 as do the tails of for allP ￿ 1. ￿ 6Still,6 many of the more interesting properties of are radically 1 different from those of when ￿ is large. In low dimensions—say ￿ 1 ￿ 3—one canP visualize the probability density function γ from (1.1). Based on that, or other methods, one knows that in low dimensions ￿ most of theP mass of lies near the origin. For example,P an inspection of the standard normal table reveals that more thanP half of the total 1 1 mass of is within one unit of the origin; in fact, [−1 ￿ 1] ≈ 0￿68268￿ ￿ In higher dimensions, however, the structure of can be quite dif- ￿ ferent. Recall the random variable S from (1.3), and apply Khintchine’s ￿ weak law of large numbers1 XXX to see that S /￿ converges in prob- ability to one, as ￿ →∞R . R ThisP is another way to say that for every 1 In the present setting, it does￿ not make￿ sense to discuss almost-sure convergence since the un- derlying probability space is ( ￿ ￿( ) ￿ ￿). 1. Introduction 5

P R ex:CoM ε>0, 6 6 ￿ ￿ 1/2 1/2￿ E ￿lim→∞ ￿ ∈ : (1 − ε)￿ ￿￿￿ (1 + ε)￿ =1￿ (1.5) P P E The proof is short and can reproducedP right here:E Let denote6 the ￿ ￿ expectation operator for := , Var := the , etc. Since −2S −=1 ￿ ￿ ￿ Var S = ￿￿ Chebyshev’s inequality yields {|S − S | >ε￿} ε ￿ . P R Equivalently, 6 6 > WLLN ￿ ￿ 1/2 1/P2￿ 1 ￿ ∈ : (1 − ε)￿ ￿￿￿ (1 + ε)￿ 1 − 2 ￿ (1.6) ￿ε ￿ Thus we see that, when ￿ is large, the measure concentrates much1/2 of its total mass near the boundaryP of the centered ball of radius ￿ , very far from the origin. A more careful examination shows that, in ￿ fact, very little of the total mass of is elsewhere when ￿ is large. The following theorem makes this statement more precise. Theorem 1.2 is a simple example of a remarkable property of Gaussian measures that is known commonly as concentration of measure XXX. We will discuss th:CoM:n this topic in more detail in due time. P R CoM:n Theorem 1.2. For every ε>60, 6 > ￿ ￿ 1/2 1/2￿ −￿ε ￿ ∈ : (1 − ε)￿ ￿￿￿ (1 + ε)￿ 1 − 2e ￿ (1.7)

Theorem 1.2 does not merely improve the crude bound (1.6). Rather, it describes an entirely new phenomenon in high dimensions. To wit, suppose we are working in dimension ￿ = 500. When ε =0, the> left- hand side of (1.7) is equal to 0. But if we increase ε slightly, say to ε =0￿01, then the left-hand side of (1.7) increases to a probability 0￿986 (!). By comparison, (1.6) reportsE a sillyE bound in this case. P P ￿ Proof. From now on, we let := denote the expectation operator E ￿ E P P P for := . That is, ￿ ￿ 1 ￿ ￿ ￿ (￿) := (￿) := ￿ d = ￿ d for all ￿ ∈ L ( )￿ ￿ 2 2 ￿ ￿ E mgf:chi2 Since S := ￿=1 Z￿ has a χ￿ distribution, λS￿ −￿/2 E > 1 2 e = (1 − 2λ) for −∞ <λ< / ￿ (1.8) ￿ 1 2 and exp(λS )=∞ when λ / . 1 P P P 6 2 We use the preceding as follows: For all ￿>0 and2 λ ∈ (0 ￿ / ), 2 ￿ 1/2 ￿ 2 ￿ λS￿ λ￿￿ ￿ −￿/2 −λ￿￿ ￿ ￿ : ￿￿￿ >￿ ￿ = {S >￿￿ } = e > e (1−2λ) e ￿ R

￿ 6 1. The Canonical Gaussian Measure on

thanks to Chebyshev’s inequality. Since the left-hand side is independent 1 2 1 2 of λ ∈P(0 ￿ / ), we may optimize the right-hand side over λ ∈ (0 ￿ / ) to find that 6 ￿ ￿ ￿￿ ￿ 1/2 ￿ 2 1 ￿ : ￿￿￿ >￿ ￿ exp −￿ sup1 2 λ￿ + log(1 − 2λ) 0<λ< / 2 ￿ ￿ ￿ 2 ￿￿ = exp − ￿ − 1 − 2 log ￿ ￿ 2 1 2 Taylor expansion yields log ￿<￿− 1 − 2 (￿ − 1) when ￿>1. This and the P BooBooBound previous inequality together yield 6 ￿ 1/2 ￿ −￿￿(￿−1) −￿|￿−1| ￿ : ￿￿￿ >￿ ￿ e < e ￿ (1.9)

When ￿ =1, the same inequality holds vacuously. Therefore, it suffices to consider the case that ￿<1. P P In that case, we argue similarly and write2 ￿ 1/2 ￿ ￿ −λS￿ −λ￿￿ ￿ ￿ : ￿￿￿ <￿ ￿ 6= e > e [for all λ>0] ￿ ￿ ￿￿ 2 1 exp −￿ sup −λ￿ + log(1 + 2λ) λ>0 2 ￿ ￿ ￿ 2 ￿￿ = exp − 1 − ￿ − 2 log ￿ ￿ 2 P 2 Since ￿<1, a Taylor expansion6 yields −2 log ￿>2(1−￿)+(16 −￿) , whence ￿ 1/2 ￿ ￿ ￿ ￿ 2 ￿￿ −￿(1−￿) ￿ : ￿￿￿ <￿ ￿ exp − 1 − ￿ − 2 log ￿ e ￿ ⇤ 2 P This estimate and (1.9) together complete the proof. 1/2 The preceding discussion shows that when ￿ is large, {￿Z￿≈￿ } is extremelyP close to one. One can see from another appeal to the weak law of large numbers that6 for6 all ε>0, ￿ 1/￿ 1/￿￿ ￿ ￿ E ￿ (1 − ε)µR￿ ￿Z￿ (1 + ε)µ ￿ → 1 as ￿ →∞￿ ￿ ￿ 1 ￿ for all ￿ ∈ [1 ￿ ∞), where µ := (|Z | ) and ￿·￿ [temporarily] denotes ￿ ￿ 6 the ￿ -norm on ; that is, 1/￿ ￿ ￿ 6 6 ￿ ￿￿￿=1 ￿ ￿ R ￿ |￿ | if 1 ￿<∞￿ ￿￿￿ := 1 ￿ ￿ ￿ R R P max |￿ | if ￿ = ∞￿ ￿ for￿ all ￿ ∈￿ . These results suggest that the ￿-dimensional Gauss space ￿ ( ￿ ￿( ) ￿ ) has unexpected geometry when ￿ ￿ 1. P Interestingly enough, the case∞ ￿ = ∞ is different still. The following might help us anticipate which ￿ -balls might carry most of the measure ￿ when ￿ ￿ 1. 1. Introduction 7 pr:max 6 6 6 6

￿ 1 ￿ ￿E ￿ 1 ￿ ￿ ￿ Proposition 1.3. Let M := max |Z | or := max Z . Then,

1 ￿ ￿lim→∞ √E (M )=1￿ 2 log ￿ ￿ It is possible to compute (M ) directly, but the preceding bound turns out to be more informative for our ends, and good enough for our present needs. P 6 P 6 Proof. For all ￿>0, 2 −￿ /2 ￿ 1 {M >￿} 1 ∧ ￿ {|Z | >￿} 1 ∧ 2￿e ￿ NowE choose and fix someP ε ∈ (0 ￿ 1) and write, using this bound, ￿ 6 6 ￿ ￿ ∞ ￿ 6 6 ￿ ￿ 2(1+ε) log ￿ ￿ ∞ ￿ ￿ 1max￿ ￿ |Z | 6= 0 1max￿ ￿ |Z | >￿ d￿ = 0 + 2(1+ε) log ￿ ∞ 2 ￿ ￿ −￿ /2 2(1 + ε) log ￿ +2￿ √2(1+ε) log ￿ e d￿ −ε E ￿ 6 ￿ ￿ = 2(1 + ε) log ￿ + ￿ ￿ ￿ ￿ √ Thus, we have (M ) (1 + ￿(1)) 2 log ￿, which implies half of the proposition.P For the other half, we use Lemma 1.1 to see that, 6 ￿ 6 6 ￿ ∞ 2 ￿ ￿ ￿ ￿ −￿ /2 ￿ ￿ 1 1max￿ ￿ Z 2(1 − ε) log ￿ = 1 − √ √2(1−ε) log ￿ e d￿ 2π ￿ ￿ −1+ε￿ 1+￿(1) R = 1 − √ 6 ￿ = ￿(1)￿ 2π log ￿ E E for example because 1>− ￿ >exp(−￿) for all ￿ ∈ . Therefore, ￿ 6 6 6 6 ￿ ￿ 6 6 6 6 ￿ ￿ ￿ ￿ ￿ ￿ 1max￿ ￿ Z ;1 max￿ ￿ Z 0 > 1max￿ ￿ Z ;1 max￿ ￿ Z > 2(1 − ε) log ￿ ￿ (1 + ￿(1)) 2(1 − ε) log ￿￿ OnE the other hand, 6 E ￿ ￿ ￿ 6 6 6 6 ￿￿ ￿ 6 6 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1max￿ ￿ Z ;1 max￿ ￿ Z < 0 ￿ |Z |;1 max￿ ￿ Z < 0 ￿ ￿ ￿=1 6 P ￿ 6 6 ￿ ￿ −￿/2 ￿ ￿ 1max￿ ￿ Z < 0 = ￿2 = ￿(1)￿ E > as ￿ →∞, thanks to the Cauchy–Schwarz inequality. The preceding two⇤ ￿ ￿ displays together imply that M (1 + ￿(1)) 2(1 − ε) log ￿ as ￿ →∞ for every ε ∈ (0 ￿ 1), and prove the remaining half of the proposition. R

￿ 8 1. The Canonical Gaussian Measure on

2. The Projective CLT P E R The characteristic function of the random2 vector Z is ￿￿·Z −￿￿￿ /2 ￿ ￿ ￿ (￿) := e = e [￿ ∈ ]￿ E If M denotes any ￿ × ￿ orthogonal matrix,2 then 2 ￿￿·MZ −￿MP￿￿ /2 −￿￿￿ /2 e = e = e ￿ ￿ Therefore, the distribution of MZ is as well. InS particular, theR normal- S ized random vectors Z/￿Z￿ and MZ/￿MZ￿ have￿ the−1 same distribution.￿ Since the uniform distribution on the ￿-sphere￿−1 := {￿ ∈ : ￿￿￿ = 1} isS the unique on that is invariant under or- > thogonal￿−1 transformations, it follows that Z/￿Z￿ is distributed√ uniformly on . By the weak law of large numbers, ￿Z￿/ ￿ converges to 1 in probability as ￿ →∞. Therefore, for all fixed ￿ 1, the random √ 1 ￿ vector ￿(Z ￿￿￿￿Z )/￿Z￿ converges weakly to a ￿-vector of i.i.d. N(0 ￿ 1) random variables as ￿ →∞. pr:ProjCLT > In other words, we haveR provedR the following. S Proposition 2.1. Choose and￿ fix an integer ￿ 1 and a bounded and ￿ ￿ : → . Let µ denote the uniform measure √ ￿−1 P on ￿ . Then,S R ￿ ￿ √ ￿−1 1 ￿ ￿ 1 ￿ ￿ ￿ ￿lim→∞ ￿ ￿(￿ ￿￿￿￿￿￿ ) µ (d￿ ···d￿ )= ￿ d ￿ 6 6 S In other words, this means1/2 roughly that the1/2 conditional distribution of Z, given that√(1 −￿−ε1)￿ ￿Z￿ (1 + ε)￿ , is close to the uniform distribution on ￿ as ￿ →∞and ε → 0. It is in fact possible to make this statement precise using a Bayes-type argument.6 But6 we will not do so here. Instead we end this section by noting1/2 the following: Since1/2 the probability of the conditioning event {(1 R− ε)￿ ￿Z￿ (1 + ε)￿ } is S almost one—see (1.5)—we can see that Proposition￿ 2.1 is a way of stating that the canonical√ Gaussian￿−1 measure on is very close to the uniform distribution on ￿ when ￿ ￿ 1. Proposition 2.1 has other uses as well. P 3. Anderson’s Shifted-Ball Inequality ￿ One of the deep properties of is that it is “unimodal,” in a certain sense that we will describe soon. That result hinges on a theorem of T. W. Anderson XXX in convex analysis. Anderson’s theorem has many deep applications in probability theory, as well as multivariate , which originally was one of the main motivations for Anderson’s work. 3. Anderson’s Shifted-Ball Inequality 9

We will see some applications later on. For now we contend ourselves with a statement and proof. In order to state Anderson’sR theorem efficiently, let us recall a few

basic notions from [undergraduate]￿ real analysis. Recall that a set E ⊂ is convex if λ￿ + (1 − λ)￿ ∈ E for all ￿￿￿ ∈ E and λ ∈ [0 ￿ 1]. You should check that E is convex if and only if R R E = λE + (1 − λ)E for all λ ∈ [0 ￿ 1]￿ ￿ where for all α￿ β ∈ and A￿B ⊆ , pr:convex:meas R αA + βB := {α￿ + β￿ : ￿ ∈ A￿￿ ∈ B}￿ > ￿ Proposition 3.1. Every convex set E ⊂ is Lebesgue measurable.R

Remark 3.2. Suppose ￿ 2 and E = B(0 ￿ 1) ∪ F, where B(0 ￿ 1) ￿is the usual notation for the Euclidean ball of radius one about 0 ∈ , and F ⊂ ∂B(0 ￿ 1). Then E Ris always convex. But E is not Borel measurable unless F is. Still, E is always￿ Lebesgue measurable, in this case because F is Lebesgue null in . > Proof. We will prove Claim. Every bounded convex set is measurable. This does the job since whenever E is convex and ￿ 1, E ∩ B(0 ￿￿) is a bounded convex∞ set, which is measurable according to the Claim. Therefore, E = ∪￿=1E ∩ B(0 ￿￿) is also measurable.

Since ∂E is closed, it is measurable. We will prove that0 |∂E| =0. This shows that the difference between E and the open set E is null, whence E is Lebesgue measurable. There are many proofs of this fact. Here is an elegant one, due to Lang XXX. R 6 Define ￿ ￿ ￿ −￿￿ ￿ ￿ := B ∈￿( ): |B ∩ ∂E| 1 − 3 |B| ￿ Then ￿ is clearly a monotone class, and closed under disjoint unions. R We plan to prove that￿ every upright rectangle, that is every nonempty ￿￿=1 ￿ ￿ set of of the6 form (￿ ￿￿ ],￿ is in ￿. If so, then the monotone class theorem implies that−￿ ￿ = ￿( ). That would show, in turn, that |∂E| = |∂E6∩ ∂6E| (1 − 3 )|∂E|, which proves￿ the claim. ￿ ￿ ￿ ￿ ￿ Choose and fix a rectangle B := ￿=1(￿ ￿￿ ], where ￿ <￿ for all ￿ ￿ 1 ￿ ￿. Subdivide each 1-D interval (￿ ￿￿ ] into 3 equal-sized parts: ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ (￿ ￿￿ +￿ ], (￿ +￿ ￿￿ +2￿ ], and (￿ +2￿￿￿￿ +3￿ ] where ￿ := (￿ −￿ )/3￿ We can write B as a disjoint￿ union of 3 equal-sized rectangles, each of ￿￿=1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ which has the form 6(￿ 6+ ￿￿￿ ￿￿ + (1 + ￿ ]￿ ] where? ￿ ∈{0 ￿ 1 ￿ 2}. 1 3 Call these rectangles B ￿￿￿￿￿B￿ . Direct inspection shows that there ￿ must exist an integer 1 ￿ 3 such that ∂E ∩ B = . For otherwise R

￿ 10 1. The Canonical Gaussian Measure on

￿ ￿￿=1 ￿ ￿ ￿ ￿ the0 middle rectangle (￿ + ￿ ￿￿ +2￿ ] would have to lie entirely in E and intersect ∂E at the same time; this would contradict the existence of a supporting hyperplane at every point of ∂E. Let us fix the integer ￿ alluded to here. ￿ Since the B ’s are translates of one another they have the same mea- 6 sure. Therefore, 6 6 −￿ ￿ ￿ ￿ ￿ ￿ |B ∩ ∂E| ￿ |B | = |B|−|B | = 1 − 3 |B|￿ 1 ￿ 3 ￿￿=￿ ⇤

This proves that every rectangle B ∈￿, and completes the proof. R We will also recall two standard definitions. R R￿ Definition 3.3.R A set E ∈ is symmetric ifRE = −E￿ ￿ > > Definition 3.4. If ￿ : → −is1 a measurable function,￿ then its level set at level ￿ ∈ is defined as ￿ [￿￿∞) := {￿ ∈ : ￿(￿) ￿} := {￿ ￿}￿

th:Anderson The main result of this section is Anderson’sR inequality XXX, which I mention next. 1 ￿ Theorem 3.5 (Anderson’s inequality). Let ￿ ∈ L ( ) be a non-negative symmetric function that has convex> level sets. Then, ￿ ￿ R R E ￿(￿ − λ￿) d￿ E ￿(￿ − ￿) d￿￿ ￿ ￿ for all symmetric convex sets E ⊂ , every ￿ ∈ , and all λ ∈ [0 ￿ 1].

Remark￿ 3.6. It follows immediately from Theorem 3.5 that the map λ ￿￿ E ￿(￿ − λ￿) d￿ is non increasing for λ ∈ [0 ￿ 1].

The proof will takeP up the remainder of this chapter. For now, let us remark briefly on how the Anderson inequality might be used to analyse ￿ the Gaussian measure . ￿ R Recall γ from (1.1), and note that6 for every ￿>0, the level set −1 ￿ ￿ ￿ ￿ γ￿ [￿￿∞)= ￿ ∈ : ￿￿￿ 2 log ￿ + ￿ log(2π) is a closed ball, whence convex and symmetric. Therefore, we can apply Anderson’s inequality with λ =0to see the following immediate corol- co:Anderson R 6 6 lary.R P P > ￿ Corollary￿ 3.7. For all symmetricP convexP sets E ⊂ , 0 λ 1, and ￿ ￿ 6 ￿ ∈ , (E + λ￿) (E + ￿). In particular, ￿ ￿ (E + ￿) (E)￿ 3. Anderson’s Shifted-Ball Inequality 11

P 6 It is important to emphasize the remarkable fact that Corollary 3.7 is a “dimension-free theorem.” Here is a typical consequence: {￿Z −￿￿ ￿} is maximized at ￿ =0for all ￿>0. For this reason, Corollary 3.7 is sometimes referred to as a “shifted-ball inequality.” One can easilyR generalize6 the preceding example with a little extra effort. Let us first￿ note that if M is an ￿ × ￿ positive-semidefinite matrix, thenR E R:= {￿P ∈ : ￿ · M￿ 6￿} is a symmetric convex set for every real￿ number￿ ￿>0. Equivalently, E is the event—in our probability space ￿ ( ￿ ￿( ) ￿ )—that Z · MZ ￿. Therefore, Anderson’s shifted-ball P 6 6 P 6 R inequality implies that ∀ ￿ ￿ ￿ {(Z − µ) · M(Z − µ) ￿} {Z · MZ ￿} ￿>0 and µ ∈ ￿

This inequality has applications2 in multivariate statistics, particularly to the analysis of “Hotelling’s T statistic.” We will see other interesting examples later on.

§1. Part 1. The Brunn–Minkowski Inequality. The Brunn–Minkowski inequality XXX is a classical result from convex analysis, and has pro- found connections to several other areas of research. In this subsection we state and prove the Brunn–Minkowski inequality. It is easy to see that if A and B are compact, then so is A+B, since the latter is clearly bounded and closed. In particular, A + B is measurable. The Brunn–MinkowskiR inequality relates the of the Minkowski sum A +￿B to those of A and B. Let |···|denote the LebesgueR measure on . Then we have the following.

Theorem￿ 3.8 (The Brunn–Minkowski> Inequality). For all compact sets A￿B ⊂ , 1/￿ 1/￿ 1/￿ |A + B| |A| + |B| ￿ 6 6

We can replace A by λA and B by (1 − λ)B, where 0 λ 1, and > recast the Brunn–Minkowski inequality in the following equivalent form: R 1/￿ 1/￿ 1/￿ |λA + (1 − λ)B| λ|A| + (1 − λ)|B| ￿ ￿ for all compact sets A￿B ⊂ and λ ∈ [0 ￿ 1]. This formulation suggests the existence of deeper connections to convex analysis because if A and B are convex sets, then so is λA + (1 − λ)B for all λ ∈ [0 ￿ 1].

Proof. The proof isR elementary but tricky. In order to clarify the un-

derlying ideas, we will￿ divide it up into 3 small steps. Step 1. Say that K ⊂ is a rectangle when K has the form, 1 1 1 ￿ ￿ ￿ K =[￿ ￿￿ + ￿ ] ×···×[￿ ￿￿ + ￿ ]￿ R

￿ 12 1. The Canonical Gaussian Measure on R

￿ 1 ￿ 1 ￿ for some ￿ := (￿ ￿￿￿￿￿￿ ) ∈ and ￿ ￿￿￿￿￿￿ > 0. We refer to the point 1 ￿ ￿ as the lower corner of K, and ￿ := (￿ ￿￿￿￿￿￿ ) as the length of K. In this first step we verify the theorem in the case that A and B are rectangles with respective lengths ￿ and ￿. In this case, we can see that A + B is an rectangle of sidelength ￿ + ￿. The Brunn–Minkowski inequality, in this case, follows from the following application of Jensen’s inequality [the arithmetic–geometric6 inequality]: ￿ ￿ ￿1/￿ ￿ ￿ ￿1/￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 ￿ ￿ 1 ￿ ￿ ￿ ￿ + ￿ ￿ ￿ ￿ + ￿ ￿ ￿=1 ￿ + ￿ ￿=1 ￿ + ￿ ￿ ￿=1 ￿ + ￿ ￿ ￿=1 ￿ + ￿ =1￿

Step 2. Now we consider the case that A and B are interior-disjoint> [or “ID”] finite unions6 of rectangles. + 1 − For every compact set K let us write K := {￿ ∈ K : ￿ 0} and 1 K := {￿ ∈ K : ￿ 0}. Now we apply a so-called “Hadwiger–Ohmann cut”: Notice that if we translate A and/or B, then we do not alter |A + B|, |A|, or |B|. Therefore,+ after+ we translate the sets− suitably,− we can always ensure that: (a) A and B are rectangles; (b) A and B are ID unions of rectangles; and (c) + + |A | |B | = ￿ |A| |B| > > With this choice in mind, we find that ￿ + + − − ￿ + 1/￿ + 1/￿￿ − − |A + B| |A + B | + |A + B | |A | + |B | + |A + B |￿ + + − −

thanks to Step 1 and the fact that A +B is disjoint￿ from A +B .￿ Now, ￿ ￿ + 1/￿ ￿ ￿ 1/￿ ￿ ￿ + 1/￿ + 1/￿￿ + |B | + |B| |A | + |B | = |A | 1+ + 1/￿ = |A | 1+ 1/￿ ￿ |A | |A| > ￿ whence ￿ 1/￿ ￿ + |B| − − |A + B| |A | 1+ 1/￿ + |A + B |￿ |A| − −￿± − −￿± Now split up, after−￿± possibly also translating, A−￿±into A and B into B such that:−￿+ (i) −A are−￿+ interior− disjoint; (ii) B are interior disjoint; and− (iii) |A− |/|A | = |B |/|B |. Thus, we can apply the preceding to A > and B in place of A and B in￿ order to see that ￿ ￿ 1/￿ ￿ ￿ 1/￿ ￿ + |B| −￿+ |B| −￿− −￿− |A + B| |A | 1+ 1/￿ + |A | 1+ 1/￿ + |A + B | |A| ￿ |A| ￿ 1/￿ ￿ ￿ + −￿− ￿ |B| −￿− −￿− = |A | + |A | 1+ 1/￿ + |A + B |￿ |A| 3. Anderson’s Shifted-Ball Inequality 13

−￿− −￿− And now continue to split and translate A+ and B −￿+, etc. In this way+ 0 1 0 we obtain−￿+ a countable sequence A := A , ∞A := A ,...,B := B , 1 ￿=0 ￿ B :=∞B , . . . of ID rectangles such that: (i) ∪ B = B [after translation]; ￿=0 ￿ (ii) ∪ >A = A [after translation]; and (iii) ∞ ￿ ￿ 1/￿ 1/￿ ￿ ￿ ￿ ￿ ￿ ￿ 1/￿ 1/￿￿ ￿ ￿ |B| |B| |A+B| |A | 1+ 1/￿ = |A| 1+ 1/￿ = |A| + |B| ￿ ￿=0 |A| |A|

This proves the result in the case that A and B are ID unions of rectan- gles.

Step 3. Every compact set can be written as1 an2 countable1 union2 of ID rectangles. In other￿ words,￿ > we can find A ￿A ￿￿￿￿ and B ￿B ￿￿￿￿￿ such￿+1 that: (i)￿ Every￿+1A and B >is a finite union> of∞ ID rectangles;￿ (ii)∞ A￿ ⊆ A> ￿=1 ￿=1 and B ⊆ B for all1￿￿￿/￿ 1￿; and (iii)￿ 1/A￿ = ∪ ￿ 1A/￿ and ￿B =1/￿∪ B . By the previous step, |A+B| |A +B | |A | +|B | for all ￿ ⇤1. Let ￿ ↑∞and appeal to the inner continuity of Lebesgue measure in order to find deduce the theorem in its full generality.

§2. Part 2. ChangeR of Variables. In the secondR part of the proof we

develop an elementary￿ fact from integration theory. + Let A ⊆ be a Borel set, and ￿ : A → a Borel-measurable Rfunction. Definition 3.9. The distribution function of ￿ is the function G¯ :[0￿ ∞) → + , defined as > > > ￿ −1 ￿ ￿ ￿ G¯ (￿) := ￿￿ [￿￿∞)￿ := |{￿ ∈ A : ￿(￿) ￿}| := |{￿ ￿}| for all ￿ 0.

This is standard notation in classical analysis, and should not be mis- taken with the closely-related definition of cumulative distribution func- tions in probability and statistics. In any case, the following should be pr:ChangeofVar familiar to you. R R Proposition 3.10 (Change of Variables Formula). For every Borel mea- + + surable function F : → , ￿ ￿ ∞ A F(￿(￿)) d￿ = − 0 F(￿) dG¯ (￿)￿

If, in addition, A is compact and F is absolutely continuous, then ∞ ￿ ￿ ￿ ￿ A F (￿(￿)) d￿ = 0 F (￿)G¯ (￿) d￿￿ R

￿ 14 1. The Canonical Gaussian Measure on 1 >

Proof. First consider the case that F = [￿￿￿) for some ￿ ￿>0. In that case, 6 ￿ ∞ 0 F(￿) dG¯ (￿)=G¯ (￿−) − G¯ (￿)=− |{￿ ∈ A : ￿ ￿(￿) <￿}| ￿ = − A F(￿(￿)) d￿￿

This proves our formula when F is a simple function. By linearity, it holds also when F is an elementary function. The general form of the first assertion of the proposition follows from this and Lebesgue’s⇤ dominated convergence theorem. The second follows from the first and integration by parts for Stieldjes integrals.

§3. Part 3. The Proof of Anderson’s Inequality. Let us define a new number α ∈ [0 ￿ 1] by α := (1 + λ)/2. The number α is chosen so that α￿ + (1 − α)(−￿)=λ￿￿ Since E is convex, we have E = αE + (1 − α)E. Therefore, the preceding display implies that (E + λ￿) ⊇ α(E + ￿) + (1 − α)(E − ￿)￿ And because the intersection of two convex sets is a convex set, we may infer that −1 ￿ −1 ￿ ￿ −1 ￿ (E+λ￿)∩￿ [￿￿∞) ⊇ α (E + ￿) ∩ ￿ [￿￿∞) +(1−α) (E − ￿) ∩ ￿ [￿￿∞) ￿

Now we apply the Brunn–Minkowski inequality in order to see that 1/￿ ￿ −1 ￿ ￿ ￿ ￿(E + λ￿) ∩ ￿> [￿￿∞)￿ 1/￿ 1/￿ ￿ −1 ￿ ￿ −1 ￿ ￿ ￿ ￿ ￿ α ￿(E + ￿) ∩ ￿ [￿￿∞)￿ + (1 − α) ￿(E − ￿) ∩ ￿ [￿￿∞)￿ ￿

Since E is symmetric, E − ￿ = −(E + ￿). Because of this identity and the fact that ￿ has symmetric level sets, it follows that −1 ￿ −1 ￿ (E − ￿) ∩ ￿ [￿￿∞)=− (E + ￿) ∩ ￿ [￿￿∞) ￿

Therefore, 1/￿ 1/￿ ￿ −1 ￿ ￿ −1 ￿ ￿ ￿ ￿ ￿ ￿(E + ￿) ∩ ￿ [￿￿∞)￿ = ￿(E − ￿) ∩ ￿ [￿￿∞)￿ ￿ > whence ￿ −1 ￿ ￿ −1 ￿ λ ￿ ￿ ￿ ￿ 1 H¯ (￿) := ￿(E + λ￿) ∩ ￿ [￿￿∞)￿ ￿(E + ￿) ∩ ￿ [￿￿∞)￿ := H¯ (￿)￿ 4. Gaussian Random Vectors 15

Now two applications of the change of variables formula [Proposition 3.10] yield the following: ￿ ￿ ￿ ￿ E ￿(￿ − λ￿) d￿ − E ￿(￿ − ￿) d￿ = E+λ￿ ￿(￿) d￿ − E+￿ ￿(￿) d￿ ￿ ∞ ￿ ∞ ¯ λ ¯ 1 = − 0 ￿dH (￿)+ 0 ￿ d>H (￿) ∞ ￿ ￿ ￿ λ 1 = 0 H¯ (￿) − H¯ (￿) d￿ 0￿ ⇤

This completes the proof of Anderson’s inequality. Q 4. Gaussian Random Vectors Q Let (Ω ￿ ￿ ￿ ) be a probability space, and recall the following. 1 ￿ Definition 4.1. A random ￿-vector X =(X ￿￿￿￿￿X ) in (Ω ￿ ￿ ￿ ) is Gaussian if ￿ · X has a for every non-random ￿- R vector ￿. R Q P ￿ General￿ theory ensures that we can always assume that Ω= , ￿ ￿ = ￿( ), and = , which we will doR from now on without further

mention in order to save on the typography.￿ If X is a Gaussian random vector in then ￿ · X has moments of all orders. Let µ and Q2 respectively denote the mean vector and the covariance matrix of X. Then it is easy to see that ￿ · X must have a N(￿ · µ￿￿·EQ￿) distribution. In particular, the characteristicR function of X is given by ￿ ￿￿·X￿ ￿ 1 ￿ R ￿ e = exp ￿￿ · µ − 2 ￿ · Q￿ for all ￿ ∈ ￿ ￿ Definition 4.2. Let X be a GaussianR random vector in with mean µ and covariance matrix Q. The distribution￿ of X is then called a multi- ￿ variate normal distribution on and is denoted by N (µ￿Q).

When Q is non singular we can invert the Fourier transform to find R that the probability density function of X is X 1 ￿ 1 −1 ￿ ￿ ￿ (￿)= ￿/2 1/2 exp − 2 (￿ − µ) · Q (￿ − µ) [￿ ∈ ]￿ (2π) | det QR|

When Q is singular, the￿ distribution of X is singular with respect to the Lebesgue measureE on , and hence does not have a density. P 2This that µ￿ = (X￿) and Q￿￿￿ = Cov(X￿ ￿X￿ ) respectively denote the expectation and covari- ance operators with respect to . R

￿ 16 1. The Canonical Gaussian Measure on

P Example 4.3. Suppose ￿ =2and W has a N(0 ￿ 1) distribution onR the 1 lineR [which you might recall is denoted by ]. Then, the distribution of2 X =(W￿W) is concentrated on the diagonal {(￿￿￿): ￿ ∈ } of R . Since the diagonal has zero Lebesgue measure, it follows that the distribution2 of X is singular with respect to the Lebesgue measure on .

lem:G1 The following are a series of simple, though useful, facts from ele- Rmentary probability theory. ￿ Lemma￿ 4.4. Suppose X has a N (µ￿Q) distribution. Then for all ￿ ∈ ￿ lem:G2 and every ￿ × ￿ matrices A, the distribution of AX + ￿ is N (Aµ + ￿￿A· QA). 6 6 ￿ Lemma 4.5. Suppose X has a N (µ￿Q) distribution. Choose and fix 1 K an integer 1 K ￿, and suppose in addition that I ￿￿￿￿￿I are K disjoint subsets of {1 ￿￿￿￿￿￿} such that ￿ ￿ ￿ Cov(X ￿X)=0 whenever ￿ and ￿ lie in distinct I ’s￿ ￿ ￿∈I1 ￿ ￿∈IK lem:G3 Then, (X ) ￿￿￿￿￿(X ) are independent, each having a multivariate normal distribution. ￿ Lemma 4.6. Suppose X has a N (µ￿Q) distribution, where−1/Q2 is a sym- metric, non-singular, ￿ × ￿ covariance matrix. Then Q X has the same distribution as Z. R P We can frequently use one, or￿ more, of these results to study the th:Anderson:Gauss general Gaussian distribution on via the canonical Gaussian measure ￿ . Here is a typical example. ￿ TheoremR 4.7 (Anderson’sR Shifted-Ball Inequality). If X has a N (0 ￿Q) distribution￿ and Q isP positive￿ definite,6 P then for all convex symmetric sets F ⊂ and ￿ ∈ , {X ∈ ￿ + F} {X ∈ F}￿ P−1/2 P Proof. Since Q X has the same distribution as Z, ￿ −1/2 −1/2 ￿ {X ∈ ￿ + FP} = Z ∈ Q ￿ + Q F ￿ −1/2 Now Q F Pis symmetric and convex becauseP F is. Apply Anderson’s ￿ 6 shifted-ball inequality for [Corollary 3.7] to see that ￿ −1/2 −1/2 ￿ ￿ −1/2 ￿ ⇤ Z ∈ Q ￿ + Q F Z ∈ Q F ￿ This proves the theorem.

The following comparison theorem is one of the noteworthy corol- laries of the preceding theorem. 4. Gaussian Random Vectors 17 th:Anderson:Gauss:2

￿ X Corollary 4.8. SupposePX and Y areP respectively distributed as N (0 ￿Q ) ￿ Y X Y 6 and N (0 ￿Q ), where Q − Q is positive semidefinite. Then, R {X ∈ F} {Y ∈ F}￿ ￿ for all symmetric, closed convex sets F ⊂ .

X Y X Y Proof. First consider the case that Q , Q , and Q − Q are positive ￿ X Y definite. Let W be independent of Y and have a N (0 ￿QW − Q ) distribu- tion. The distribution of W has a probability density ￿ , and W + Y is P P P 6 P distributed as X, whence R ￿ W {X ∈ F} = {W + Y ∈ F} = ￿ {Y ∈−￿ + F}￿ (￿) d￿ {Y ∈ F}￿

X Y

thanks to Theorem 4.7. ThisY proves the theorem in theX caseY that Q −Q is positive definite. If Q is positive definite and Q − Q is positive semidefinite, then we define for all 0 <δ<ε<1, X(ε) := X + εU￿Y(δ) := Y + δU￿ ￿ where U is independent of (X￿Y) and has the same distribution N (0 ￿I) ￿ X(ε) as Z. The respective distributions of X(ε) and Y(δ) are N (0 ￿Q ) and ￿ Y(δ) X(ε) X P Y(δ) Y P X(ε) Y(δ) N (0 ￿QX(ε) ), whereY(δ) Q := Q + εI and Q := Q 6+ δI￿ Since Q , Q , and Q − Q are positiveR definite, the portion of the theorem that has been proved so far implies￿ that {X(ε) ∈ F} {Y(δ) ∈ F}, for all⇤ symmetric convex sets F ⊂ . Let ε and δ tend down to zero, all the 6 6 while ensuring thatR δ<ε. Since F = F¯ , this proves the result.

Example￿ 4.9 (Comparison￿ of Moments). For every 1 ￿ ∞, the ￿ -norm of ￿ ∈ is 1/￿ ￿ ￿ 6 6 ￿ ￿￿￿=1 ￿ ￿ ￿ |￿ | if ￿<∞￿ R ￿￿￿ := 1 ￿ ￿ ￿ 6 max |￿ | if ￿ = ∞￿ ￿ ￿ ￿ It is easy to see that all centered ￿ -balls of the form {￿ ∈ : ￿￿￿ ￿} are convex and symmetric. Therefore, it follows immediately from P > PX Y 6 6 Corollary 4.8 that if Q − Q is positive semidefinite, then ￿ ￿ {￿X￿ >￿} {￿Y￿ >￿} for all ￿>0 and 1 ￿ ∞￿ ￿−1 Multiply both sides by ￿￿ and integrate both sides [d￿] from ￿ =0to E > E 6 6 ￿ = ∞ in order to see that ￿ ￿ ￿ ￿ ￿ ￿ ￿X￿￿ ￿Y￿￿ for ￿>0 and 1 ￿ ∞￿ These are examples of moment comparison, and can sometimes be use- ful in estimating expectation functionals of X in terms of expectation functionals of a Gaussian random vector Y with a simpler covariance R

￿ 18 1. The Canonical Gaussian Measure on P > P R 6 6 ￿ ￿ matrixE￿ than that of XE. Similarly, {￿X + ￿￿ >6￿} 6 {￿X￿ >￿} for all ￿ ∈ , ￿>0, and R1 ￿ ∞ by Theorem 4.7. Therefore, ￿ ￿ ￿ ￿￿ ￿ ￿ ￿￿ ￿X￿ =￿ inf∈ ￿X + ￿￿ for all 1 ￿ ∞ and ￿>0￿ R E

This is a nontrivial generalization2 of the familiar fact that when ￿ =1, ￿∈ Var(X) = inf (|X − ￿| ).