<<

Gaussian vectors and

Samy Tindel

Purdue University

Probability Theory 2 - MA 539

Samy T. Gaussian vectors & CLT 1 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical and

Samy T. Gaussian vectors & CLT Probability Theory 2 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical mean and variance

Samy T. Gaussian vectors & CLT Probability Theory 3 / 86 Standard Gaussian

Definition: Let X be a real valued random variable. X is called standard Gaussian if its probability law admits the density:

1 x 2 ! f (x) = √ exp − , x ∈ R. 2π 2

Notation: We denote by N1(0, 1) or N (0, 1) this law.

Samy T. Gaussian vectors & CLT Probability Theory 4 / 86 Gaussian random variable and expectations

Reminder: 1 For all bounded measurable functions g, we have

1 Z x 2 ! E[g(X)] = √ g(x) exp − dx. 2π R 2

2 In particular, Z x 2 ! √ exp − dx = 2π. R 2

Samy T. Gaussian vectors & CLT Probability Theory 5 / 86 Gaussian moments Proposition 1.

Let X ∼ N (0, 1). Then 1 For all z ∈ C, we have E[exp(zX)] = exp(z 2/2).

As a particular case, we get

−t2/2 E[exp(ıtX)] = e , ∀t ∈ R.

2 For all n ∈ N, we have   0 if n is odd, E [X n] = (2m)!  , if n is even, n = 2m. m!2m

Samy T. Gaussian vectors & CLT Probability Theory 6 / 86 Proof

(i) Definition of the transform: R exp(zx − 1 x 2)dx absolutely convergent for all z ∈ R 2 C ,→ the quantity ϕ(z) = E[ezX ] is well defined and, 1 Z  1  ϕ(z) = √ exp zx − x 2 dx. 2π R 2

(ii) Real case: Let z ∈ R. 1 2 1 2 z2 Decomposition zx − 2 x = − 2 (x − z) + 2 and change of variable y = x − z ⇒ ϕ(z) = ez2/2

Samy T. Gaussian vectors & CLT Probability Theory 7 / 86 Proof (2)

(iii) Complex case: ϕ and z 7→ ez2/2 are two entire functions Since those two functions coincide on R, they coincide on C. (iv) Characteristic function: In particular, if z = ıt with t ∈ R, we have E[exp(ıtX)] = e−t2/2

Samy T. Gaussian vectors & CLT Probability Theory 8 / 86 Proof (3) (v) Moments: Let n ≥ 1. Convergence of E[|X n|]: easy argument In addition, we almost surely have n k ıtX X (ıt) k e = lim Sn, with Sn = X . n→∞ k=0 k!

However, |Sn| ≤ Y with ∞ |t|k |X|k Y = X = e|tX| ≤ etX + e−tX . k=0 k! Since E[exp(aX)] < ∞, we obtain that Y is integrable Applying dominated convergence, we end up with

 n  n n X (ıtX) X ı t n E[exp(ıtX)] = E   = E[X ]. (1) n≥0 n! n≥0 n! Identifying lhs and rhs, we get our formula for moments Samy T. Gaussian vectors & CLT Probability Theory 9 / 86 Gaussian random variable

Corollary: Owing to the previous proposition, if X ∼ N (0, 1) ,→ E[X] = 0 and Var(X) = 1

Definition: A random variable is said to be Gaussian if there exists X ∼ N (0, 1) and two constants a and b such that Y = aX + b.

Parameter identification: we have

E[Y ] = b, and Var(Y ) = a2 Var(X) = a2.

Notation: We denote by N (m, σ2) the law of a Gaussian random variable with mean m and variance σ2.

Samy T. Gaussian vectors & CLT Probability Theory 10 / 86 Properties of Gaussian random variables

Density: we have

1 (x − m)2 ! √ exp − is the density of N (m, σ2) σ 2π 2σ2

Characteristic function: let Y ∼ N (m, σ2). Then

t2 ! E[exp(ıtY )] = exp ıtm − σ2 , t ∈ . 2 R

The formula above also characterizes N (m, σ2)

Samy T. Gaussian vectors & CLT Probability Theory 11 / 86 Gaussian law: illustration

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 !5 !4 !3 !2 !1 0 1 2 3 4 5

Figure: Distributions N (0, 1), N (1, 1), N (0, 9), N (0, 1/4).

Samy T. Gaussian vectors & CLT Probability Theory 12 / 86 Sum of independent Gaussian random variables

Proposition 2.

Let Y1 and Y2 be two independent Gaussian random variables 2 2 Assume Y1 ∼ N (m1, σ1) and Y2 ∼ N1(m2, σ2). 2 2 Then Y1 + Y2 ∼ N1(m1 + m2, σ1 + σ2).

Proof: Via characteristic functions

Remarks:

It is easy to identify the parameters of Y1 + Y2 Pn Possible generalization to j=1 Yj

Samy T. Gaussian vectors & CLT Probability Theory 13 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical mean and variance

Samy T. Gaussian vectors & CLT Probability Theory 14 / 86 notation

Transpose: If A is a matrix, A∗ designates the of A.

Particular case: Let x ∈ Rn. Then x is a column vector in Rn,1 x ∗ is a row matrix

Inner product: If x and y are two vectors in Rn, their inner product is denoted by

n ∗ ∗ X ∗ ∗ hx, yi = x y = y x = xi yi , if x = (x1, ..., xn), y = (y1, ..., yn). i=1

Samy T. Gaussian vectors & CLT Probability Theory 15 / 86 Vector valued random variable

Definition 3.

1 A random variable X with values in Rn is given by n real valued random variables X1,X2,..., Xn. 2 We denote by X the column matrix with coordinates X1, X2,..., Xn:

∗ X = (X1, X2,..., Xn).

Samy T. Gaussian vectors & CLT Probability Theory 16 / 86 and covariance

Expected value: Let X ∈ Rn. E[X] is the vector defined by

∗ E[X] = (E[X1], E[X2] ..., E[Xn]) .

Note: here we assume that all the expectations are well-defined. Covariance: Let X ∈ Rn and Y ∈ Rm. n,m The KX,Y ∈ R is defined by

∗ KX,Y = E [(X − E[X]) (Y − E[Y ]) ]

Elements of the covariance matrix: for 1 ≤ i ≤ n and 1 ≤ j ≤ m

KX,Y (i, j) = Cov(Xi , Yj ) = E [(Xi − E[Xi ]) (Yj − E[Yj ])]

Samy T. Gaussian vectors & CLT Probability Theory 17 / 86 Simples properties

Linear transforms and Expectation-covariance: Let X ∈ Rn, A ∈ Rm,n, u ∈ Rm. Then

∗ E[u + AX] = u + A E[X], and Ku+AX = KAX = AKX A .

Another formula for the covariance:

∗ ∗ KX,Y = E [XY ] − E [X] E [Y ] . As a particular case,

∗ ∗ KX = E [XX ] − E [X] E [X]

Samy T. Gaussian vectors & CLT Probability Theory 18 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical mean and variance

Samy T. Gaussian vectors & CLT Probability Theory 19 / 86 Definition

Definition: Let X ∈ Rn. X is a Gaussian random vector iff for all λ ∈ Rn n ∗ X hλ, Xi = λ X = λi Xi is a real valued Gaussian r.v. i=1

Remarks: (1) X Gaussian vector ⇒ Each component Xi of X is a real Gaussian r.v (2) Key example of Gaussian vector: Independent Gaussian components X1,..., Xn (3) Easy construction of random vector X ∈ R2 such that (i) X1, X2 real Gaussian (ii) X is not a Gaussian vector

Samy T. Gaussian vectors & CLT Probability Theory 20 / 86 Characteristic function

Proposition 4.

Let X Gaussian vector with mean m and covariance K Then, for all u ∈ Rn,

ıhu, mi− 1 u∗Ku E [exp(ıhu, Xi)] = e 2 ,

where we use the matrix representation for the vector u

Samy T. Gaussian vectors & CLT Probability Theory 21 / 86 Proof

Identification of hu, Xi: hu, Xi Gaussian r.v by assumption, with parameters

µ := E[hu, Xi] = hu, mi, and σ2 := Var(hu, Xi) = u∗Ku (2)

Characteristic function of 1-d Gaussian r.v: Let Y ∼ N (µ, σ2). Then recall that

t2 ! E[exp(ıtY )] = exp ıtµ − σ2 , t ∈ . (3) 2 R

Conclusion: Easily obtained by plugging (2) into (11)

Samy T. Gaussian vectors & CLT Probability Theory 22 / 86 Remark and notation

Remark: According to Proposition 4 ,→ The law of a Gaussian vector X is characterized by its mean m and its covariance matrix K ,→ If X and Y are two Gaussian vectors with the same mean and covariance matrix, their law is the same

Caution: This is only true for Gaussian vectors. In general, two random variables sharing the same mean and variance are not equal in law

Notation: If X Gaussian vector with mean m and covariance K We write X ∼ N (m, K)

Samy T. Gaussian vectors & CLT Probability Theory 23 / 86 Linear transformations

Proposition 5.

Let

X ∼ N (mX , KX ) A ∈ Rp,n and z ∈ Rp Set Y = AX + z Then

∗ Y ∼ N (mY , KY ), with mY = z + AmX , KY = AKX A

Samy T. Gaussian vectors & CLT Probability Theory 24 / 86 Proof

Aim: Let u ∈ Rp. We wish to prove that u∗Y is a Gaussian r.v.

Expression for u∗Y : We have

u∗Y = u∗z + u∗AX = u∗z + v ∗X, where we have set v = A∗u. This is a Gaussian r.v

Conclusion: Y is a Gaussian vector. In addition,

∗ mY = E[Y ] = z + AE[X] = z + AmX , and KY = AKX A .

Samy T. Gaussian vectors & CLT Probability Theory 25 / 86 Positivity of the correlation matrix

Proposition 6.

Let X be a random vector with covariance matrix K. Then K is a symmetric positive matrix.

Proof:

Symmetry: K(i, j) = Cov(Xi , Xj ) = Cov(Xj , Xi ) = K(j, i) Positivity: Let u ∈ Rn and Y = u∗X. Then Var(Y ) = u∗Ku ≥ 0

Samy T. Gaussian vectors & CLT Probability Theory 26 / 86 lemma

Lemma 7. Let Γ ∈ Rn,n, symmetric and positive. Then there exists a matrix A ∈ Rn,n such that Γ = AA∗

Samy T. Gaussian vectors & CLT Probability Theory 27 / 86 Proof Diagonal form of Γ: • Γ symmetric ⇒ there exists an orthogonal matrix U and ∗ D1 = Diag(λ1, . . . , λn) such that D1 = U ΓU

• Γ positive ⇒ λi ≥ 0 for all i ∈ {1, 2,..., n}.

Definition of the square root: 1/2 1/2 • Let D = Diag(λ1 , . . . , λn ). • We set A = UD.

Conclusion: −1 ∗ ∗ • Recall that U = U , therefore Γ = UD1U . 2 ∗ • Now D1 = D = DD , and thus

Γ = UDD∗U∗ = UD(UD)∗ = AA∗.

Samy T. Gaussian vectors & CLT Probability Theory 28 / 86 Construction of a Gaussian vector

Theorem 8. Let m ∈ Rn Γ ∈ Rn,n symmetric and positive Then

There exists a Gaussian vector X ∼ N (m, Γ)

Samy T. Gaussian vectors & CLT Probability Theory 29 / 86 Proof

Standard Gaussian vector in Rn: Let Y1, Y2,..., Yn, i.i.d with common law N1(0, 1). We set

∗ Y = (Y1,..., Yn), and therefore Y ∼ N (0, Idn).

Definition of X: Let A ∈ Rn,n such that AA∗ = Γ. We define X as: X = m + AY .

Conclusion: According to Proposition 5 we have X ∼ N (m, KX ), with

∗ ∗ ∗ KX = AKY A = A Id A = AA = Γ.

Samy T. Gaussian vectors & CLT Probability Theory 30 / 86 Decorrelation and independence

Theorem 9.

∗ Let X be Gaussian vector, with X = (X1,..., Xn). Then

The random variables X1,..., Xn are independent ⇐⇒ The covariance matrix KX is diagonal.

Samy T. Gaussian vectors & CLT Probability Theory 31 / 86 Proof of ⇒

Decorrelation of coordinates: If X1,..., Xn are independent, then

K(i, j) = Cov(Xi , Xj ) = 0, whenever i 6= j.

Therefore KX is diagonal.

Samy T. Gaussian vectors & CLT Probability Theory 32 / 86 Proof of ⇐ (1)

Characteristic function of X: Set K = KX . We have shown that

ıhu,E[X]i− 1 u∗Ku n E[exp(ıhu, Xi)) = e 2 , u ∈ R . (4) Since K is diagonal, we have :

n n ∗ X 2 X 2 u Ku = ul K(l, l)= ul Var(Xl ). (5) l=1 l=1

Characteristic function of each coordinate:

Let φXl be the characteristic function of Xl ısXl We have φXl (s) = E[e ], for all s ∈ R.

Taking u such that ui = 0, for all i 6= l in (4) and (5) we get

1 2 ıul E[Xl ]− 2 ul Var(Xl ) φXl (ul ) = E [exp(ıul Xl )] = e .

Samy T. Gaussian vectors & CLT Probability Theory 33 / 86 Proof of ⇐ (2)

Conclusion: We can recast (4) as follows: for all u = (u1, u2, ..., un),

n " n !# Y X φXj (uj )= E exp ı ul Xl = E[exp(ıhu, Xi)], j=1 l=1

This that the random variables X1,..., Xn are independent.

Samy T. Gaussian vectors & CLT Probability Theory 34 / 86 Lemma about absolutely continuous r.v

Lemma 10. Let ξ ∈ Rn a random variable admitting a density. H a subspace of Rn, such that dim(H) < n. Then P(ξ ∈ H) = 0.

Samy T. Gaussian vectors & CLT Probability Theory 35 / 86 Proof

Change of variables: We can assume H ⊂ H0 with

0 H = {(x1, x2, ..., xn); xn = 0}

Conclusion: Denote by ϕ the density of ξ. We have:

P(ξ ∈ H) ≤ P(ξ ∈ H0) Z

= ϕ(x1, x2, ..., xn)1{xn=0}dx1 dx2 ... dxn Rn = 0.

Samy T. Gaussian vectors & CLT Probability Theory 36 / 86 Gaussian density

Theorem 11. Let X ∼ N (m, K). Then 1 X admits a density iff K is invertible. 2 If K is invertible, the density of X is given by 1  1  f (x) = exp − (x − m)∗K −1(x − m) (2π)n/2(det(K))1/2 2

Samy T. Gaussian vectors & CLT Probability Theory 37 / 86 Proof

(1) Density and inversion of K: We have seen

(d) ∗ X = m + AY , where AA = K, Y ∼ N (0, Idn)

(i) Assume A non invertible. A non invertible ⇒ Im(A) = H, with dim(H) < n ,→ P(AY ∈ H) = 1 Contradiction: X admits a density ⇒ X − m admits a density ⇒ P(X − m ∈ H) = 0 However, we have seen that P(X − m ∈ H)= P(AY ∈ H)= 1. Hence X doesn’t admit a density.

Samy T. Gaussian vectors & CLT Probability Theory 38 / 86 Proof (2)

(ii) Assume A invertible. A invertible ⇒ application y → m + Ay is a C1 bijection ⇒ the random variable m + AY admits a density.

(iii) Conclusion. Since AA∗ = K, we have

det(A) det(A∗) = (det(A))2 = det(K) and we get the equivalence:

A invertible ⇐⇒ K is invertible.

Samy T. Gaussian vectors & CLT Probability Theory 39 / 86 Proof (3)

(2) Expression of the density: Let Y ∼ N (0, Idn). Density of Y : 1  1  g(y) = exp − hy, yi . (2π)n/2 2

Change of variable: Set

X 0 = AY + m that is Y = A−1(X 0 − m)

Samy T. Gaussian vectors & CLT Probability Theory 40 / 86 Proof (4) Jacobian of the transformation: for x 7→ A−1(x − m) we have

Jacobian = A−1

Determinant of the Jacobian:

det(A−1) = [det(A)]−1 = [det(K)]−1/2

Expression for the inner product: We have K −1 = (AA∗)−1 = (A∗)−1A−1, and

hy, yi = hA−1(x − m), A−1(x − m)i = (x − m)∗(A−1)∗A−1(x − m) =( x − m)∗K −1(x − m).

Thus X 0 admits the density f . Since X and X 0 share the same law, X admits the density f . Samy T. Gaussian vectors & CLT Probability Theory 41 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical mean and variance

Samy T. Gaussian vectors & CLT Probability Theory 42 / 86 Law of large numbers

Theorem 12. We consider the following situation: k (Xn; n ≥ 1) sequence of i.i.d R -valued r.v k Hypothesis: E[|X1|] < ∞, and we set E[X1] = m ∈ R We define n ¯ 1 X Xn = Xj . n j=1 Then ¯ lim Xn = m, almost surely n→∞

Samy T. Gaussian vectors & CLT Probability Theory 43 / 86 Central limit theorem

Theorem 13. We consider the following situation: k {Xn; n ≥ 1} sequence of i.i.d R -valued r.v 2 Hypothesis: E[|X1| ] < ∞ k k,k We set E[X1] = m ∈ R and Cov(X1) = Γ ∈ R Then √ n  ¯  (d) ¯ 1 X n Xn − m −→ Nk (0, Γ), with Xn = Xj . n j=1

¯ −1/2 Interpretation: Xn converges to m with rate n

Samy T. Gaussian vectors & CLT Probability Theory 44 / 86 Convergence in law, first definition Remark: For notational sake ,→ the remainder of the section will focus on R-valued r.v Definition 14. Let

{Xn; n ≥ 1} sequence of r.v, X0 another r.v

Fn distribution function of Xn

F0 distribution function of X0 We set C(F ) ≡ {x ∈ R; F continuous at point x}

Definition 1: We have

(d) limn→∞ Xn = X0 if limn→∞ Fn(x) = F0(x) for all x ∈ C(F ).

Samy T. Gaussian vectors & CLT Probability Theory 45 / 86 Convergence in law, equivalent definition

Proposition 15. Let

{Xn; n ≥ 1} sequence of r.v, X0 another r.v We set Cb(R) ≡ {ϕ : R → R; ϕ continuous and bounded}

Definition 2: We have

(d) limn→∞ Xn = X0 iff limn→∞ E[ϕ(Xn)] = E[ϕ(X0)] for all ϕ ∈ Cb(R).

Samy T. Gaussian vectors & CLT Probability Theory 46 / 86 Central limit theorem in R Theorem 16. We consider the following situation:

{Xn; n ≥ 1} sequence of i.i.d R-valued r.v 2 Hypothesis: E[|X1| ] < ∞ 2 We set E[X1] = µ and Var(X1) = σ Then √ n  ¯  (d) 2 ¯ 1 X n Xn − µ −→ N (0, σ ), with Xn = Xj . n j=1

Otherwise stated we have Pn X − nµ (d) i=1 i −→ N (0, 1) σn1/2

Samy T. Gaussian vectors & CLT Probability Theory 47 / 86 Application: Bernoulli distribution

Proposition 17.

Let (Xn; n ≥ 1) sequence of i.i.d B(p) r.v Then ! √ X¯ − p (d) n n −→ N (0, 1). [p(1 − p)]1/2 1

Remark: For practical purposes as soon as np > 15, the law of

X1 + ··· + Xn − np q np(1 − p) is approached by N1(0, 1). Notice that X1 + ··· + Xn ∼ Bin(n, p).

Samy T. Gaussian vectors & CLT Probability Theory 48 / 86 Binomial distribution: plot (1)

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00 0 1 2 3 4 5 6

Figure: Distribution Bin(6; 0.5). x-axis: k, y-axis: P(X = k)

Samy T. Gaussian vectors & CLT Probability Theory 49 / 86 Binomial distribution: plot (2)

0.15

0.10

0.05

0.00 0 5 10 15 20 25 30

Figure: Distribution Bin(30; 0.5). x-axis: k, y-axis: P(X = k)

Samy T. Gaussian vectors & CLT Probability Theory 50 / 86 Relation between pdf and chf

Theorem 18. Let F be a distribution function on R φ the characteristic function of F Then F is uniquely determined by φ

Samy T. Gaussian vectors & CLT Probability Theory 51 / 86 Proof (1)

Setting: We consider A r.v X with distribution F and chf φ A r.v Z with distribution G and chf γ

Relation between chf: We have Z Z e−ıθz φ(z) G(dz) = F (dx) γ(x − θ) (6) R R

Samy T. Gaussian vectors & CLT Probability Theory 52 / 86 Proof (2)

Proof of (6): Invoking Fubini, we get

h i Z E e−ıθZ φ(Z) = e−ıθz φ(z) G(dz) R Z Z  = G(dz) e−ıθz eızx F (dx) R R Z Z  = F (dx) eız(x−θ) G(dz) R R = E [γ(X − θ)]

Samy T. Gaussian vectors & CLT Probability Theory 53 / 86 Proof (3)

Particularizing to a Gaussian case: We now consider Z ∼ σN with N ∼ N (0, 1) In this case, if n ≡ density of N (0, 1), we have

G(dz) = σ−1 n(σ−1z) dz

With this setting, relation (6) becomes Z Z −ıθσz − 1 σ2(z−θ)2 e φ(σz)n(z) dz = e 2 F (dz) (7) R R

Samy T. Gaussian vectors & CLT Probability Theory 54 / 86 Proof (4)

Integration with respect to θ: Integrating (7) wrt θ we get Z x Z −ıθσz dθ e φ(σz) n(z) dz = Aσ,θ(x), (8) −∞ R where Z x Z − 1 σ2(z−θ)2 Aσ,θ(x) = dθ e 2 F (dz) −∞ R

Samy T. Gaussian vectors & CLT Probability Theory 55 / 86 Proof (5)

Expression for Aσ,θ: We have

Z Z x Fubini − 1 σ2(z−θ)2 Aσ,θ(x) = F (dz) e 2 dθ R −∞ Z Z x−z c.v: s=θ−z  −21/2 = 2πσ F (dz) n0,σ−2 (s) ds R −∞

Therefore, considering N ⊥⊥ X with N ∼ N (0, 1) we get

 −21/2  −1  Aσ,θ(x) = 2πσ P σ N + X ≤ x (9)

Samy T. Gaussian vectors & CLT Probability Theory 56 / 86 Proof (6)

Summary: Putting together (8) and (9) we get

Z x Z  1/2   dθ e−ıθσz φ(σz) n(z) dz = 2πσ−2 P σ−1N + X ≤ x −∞ R

Divide the above relation by (2πσ−2)1/2. We obtain

Z x Z σ −ıθσz  −1  1/2 dθ e φ(σz) n(z) dz = P σ N + X ≤ x (10) (2π) −∞ R

Samy T. Gaussian vectors & CLT Probability Theory 57 / 86 Proof (7)

Convergence result: Recall that

(d) (P) (d) X1,n −→ X1 and X2,n −→ X2 =⇒ X1,n + X2,n −→ X1 (11)

Notation: We set

C(F ) ≡ {x ∈ R; F continuous at point x}

Samy T. Gaussian vectors & CLT Probability Theory 58 / 86 Proof (8)

Limit as σ → ∞: Thanks to our convergence result, one can take limits in (10) σ Z x Z lim dθ e−ıθσz φ(σz) n(z) dz σ→∞ 1/2 (2π) −∞ R   = lim P σ−1N + X ≤ x σ→∞ = P (X ≤ x) = F (x), (12) for all x ∈ C(F )

Conclusion: F is determined by φ.

Samy T. Gaussian vectors & CLT Probability Theory 59 / 86 Fourier inversion

Proposition 19. Let F be a distribution function on R, and X ∼ F φ the characteristic function of F Hypothesis: 1 φ ∈ L (R)

Conclusion: F admits a bounded continuous density f , given by 1 Z f (x) = e−ıyx φ(y) dy 2π R

Samy T. Gaussian vectors & CLT Probability Theory 60 / 86 Proof (1)

Density of σ−1N + X: We set

 −1  Fσ(x) = P σ N + X ≤ x

Since both N and X admit a density, Fσ admits a density fσ

Expression for Fσ: Recall relation (10)

Z x Z σ −ıθσz 1/2 dθ e φ(σz) n(z) dz = Fσ(x) (13) (2π) −∞ R

Samy T. Gaussian vectors & CLT Probability Theory 61 / 86 Proof (2)

Expression for fσ: Differentiating the lhs of (13) we get Z σ −ıθσz fσ(θ) = 1/2 e φ(σz) n(z) dz (2π) R Z σ −ıθy −1 (c.v: σz = y) = 1/2 e φ(y) n(σ y) dy (2π) R 1 Z σ−2y2 −ıθy − 2 (n is Gaussian) = 1/2 e φ(y) e dy (2π) R

Relation (10) on a finite interval: Let I = [a, b]. Using fθ we have

Z b  −1  P σ N + X ∈ [a, b] = Fσ(b) − Fσ(a) = fσ(θ) dθ (14) a

Samy T. Gaussian vectors & CLT Probability Theory 62 / 86 Proof (3)

Limit of fσ: By dominated convergence, Z 1 −ıθy lim fσ(θ) = e φ(y) dy ≡ f (θ) σ→∞ 1/2 (2π) R

Domination of fσ: We have

1 Z σ−2y2 −ıθy − 2 fσ(θ) = 1/2 e φ(y) e dy (2π) R 1 Z ≤ 1/2 |φ(y)| dy (2π) R 1 = kφk 1 (2π)1/2 L (R)

Samy T. Gaussian vectors & CLT Probability Theory 63 / 86 Proof (4)

Limits in (14): We use On lhs of (14): Convergence result (11) On rhs of (14): Dominated convergence (on finite interval I) We get Z b P (X ∈ [a, b]) = F (b) − F (a) = f (θ) dθ a

Conclusion:

X admits f (obtained by Fourier inversion) as a density

Samy T. Gaussian vectors & CLT Probability Theory 64 / 86 Convergence in law and chf Theorem 20. Let

{Xn; n ≥ 1} sequence of r.v, X0 another r.v

φn chf of Xn, φ0 chf of X0 Then (i) We have (d) lim Xn = X0 =⇒ lim φn(t) = φ0(t) for all t ∈ n→∞ n→∞ R (ii) Assume that

φ0(0) = 1 and φ0 continuous at point 0 Then we have (d) lim φn(t) = φ0(t) for all t ∈ =⇒ lim Xn = X0 n→∞ R n→∞

Samy T. Gaussian vectors & CLT Probability Theory 65 / 86 Central limit theorem in R (repeated) Theorem 21. We consider the following situation:

{Xn; n ≥ 1} sequence of i.i.d R-valued r.v 2 Hypothesis: E[|X1| ] < ∞ 2 We set E[X1] = µ and Var(X1) = σ Then √ n  ¯  (d) 2 ¯ 1 X n Xn − µ −→ N (0, σ ), with Xn = Xj . n j=1 Otherwise stated we have

n Sn − nµ (d) X 1/2 −→ N (0, 1), with Sn = Xi (15) σn i=1

Samy T. Gaussian vectors & CLT Probability Theory 66 / 86 Proof of CLT (1)

Reduction to µ = 0, σ = 1: Set

n ˆ Xi − µ ˆ X ˆ Xi = , and Sn = Xi σ i=1 Then S − nµ Sˆ = n , Xˆ ∼ N (0, 1) n σ i and S − nµ Sˆ n = n σn1/2 n1/2 Thus it is enough to prove (15) when µ = 0 and σ = 1

Samy T. Gaussian vectors & CLT Probability Theory 67 / 86 Proof of CLT (2)

Aim: For Xi such that E[Xi ] = 0 and Var(Xi ) = 1, set

 Sn  ıt 1/2 φn(t) = E e n

We wish to prove that

− 1 t2 lim φn(t) = e 2 n→∞

According to Theorem 20 -(ii), this yields the desired result

Samy T. Gaussian vectors & CLT Probability Theory 68 / 86 Taylor expansion of the chf Lemma 22. Let Y be a r.v. ψ chf of Y Hypothesis: for ` ≥ 1, h i E |Y |` < ∞.

Conclusion: " # ` (ıs)k |s X|`+1 2|sX|` X k ψ(s) − E[X ] ≤ E ∧ . k=0 k! (` + 1)! `!

Proof: Similar to (1).

Samy T. Gaussian vectors & CLT Probability Theory 69 / 86 Proof of CLT (3)

Computation for φn: We have

  tX1 n ı 1/2 φn(t) = E e n   t n = φ , (16) n1/2 where

φ ≡ characteristic function of X1

Samy T. Gaussian vectors & CLT Probability Theory 70 / 86 Proof of CLT (4) Expansion of φ: According to Lemma 22, we have

 t  E[X ] E[X 2] φ = 1 + ıt 1 + ı2t2 1 + R n1/2 n1/2 2n n t2 = 1 − + R , (17) 2n n and Rn satisfies "|t X |3 |t X |2 # |R | ≤ E 1 ∧ 1 n 6n3/2 n

Behavior of Rn: By dominated convergence we have

lim n |Rn| = 0 (18) n→∞

Samy T. Gaussian vectors & CLT Probability Theory 71 / 86 Products of complex numbers

Lemma 23. Let

{ai ; 1 ≤ i ≤ n}, such that ai ∈ C and |ai | ≤ 1 {bi ; 1 ≤ i ≤ n}, such that bi ∈ C and |bi | ≤ 1

Then we have

n n n Y Y X ai − bi ≤ |ai − bi | i=1 i=1 i=1

Samy T. Gaussian vectors & CLT Probability Theory 72 / 86 Proof of Lemma 23

Case n = 2: Stems directly from the identity

a1a2 − b1b2 = a1 (a2 − b2) + (a1 − b1)b2

General case: By induction

Samy T. Gaussian vectors & CLT Probability Theory 73 / 86 Proof of CLT (5)

Summary: Thanks to (16) and (17) we have

  t n  t  t2 φ (t) = φ , and φ = 1 − + R n n1/2 n1/2 2n n

Application of Lemma 23: We get

!n   t n t2

φ − 1 − (19) n1/2 2n !  t  t2

≤ n φ − 1 − (20) n1/2 2n

= n |Rn| (21)

Samy T. Gaussian vectors & CLT Probability Theory 74 / 86 Proof of CLT (6)

Limit for φn: Invoking (18) and (19) we get

!n t2

lim φn(t) − 1 − = 0 n→∞ 2n

In addition n 2 ! 2 t − t 2 lim 1 − − e = 0 n→∞ 2n Therefore 2 − t lim φn(t) − e 2 = 0 n→∞

Conclusion: CLT holds, since

− 1 t2 lim φn(t) = e 2 n→∞

Samy T. Gaussian vectors & CLT Probability Theory 75 / 86 Outline

1 Real Gaussian random variables

2 Random vectors

3 Gaussian random vectors

4 Central limit theorem

5 Empirical mean and variance

Samy T. Gaussian vectors & CLT Probability Theory 76 / 86 Gamma and chi-square laws

Definition 1: For all λ > 0 and a > 0, we denote by γ(λ, a) the distribution on R defined by the density

λ−1   Z ∞ x x λ−1 −x exp − 1{x>0}, where Γ(λ) = x e dx aλΓ(λ) a 0

This distribution is called gamma law with parameters λ, a. Definition 2: Pn 2 Let X1,..., Xn i.i.d N (0, 1). We set Z = i=1 Xi . The law of Z is called chi-square distribution with n degrees of freedom. We denote this distribution by χ2(n).

Samy T. Gaussian vectors & CLT Probability Theory 77 / 86 Gamma and chi-square laws (2)

Proposition 24.

The distribution χ2(n) coincides with γ(n/2, 2). As a particular case, if

X1,..., Xn i.i.d N (0, 1) Pn 2 We set Z = i=1 Xi , then we have Z ∼ γ(n/2, 2).

Samy T. Gaussian vectors & CLT Probability Theory 78 / 86 Empirical mean and variance

Let X1,..., Xn n real r.v Definition: we set

n n ¯ 1 X 2 1 X ¯ 2 Xn = Xk , and Sn = (Xk − Xn) . n k=1 n − 1 k=1 ¯ Xn is called empirical mean. 2 Sn is called empirical variance. Property: Let X1,..., Xn n i.i.d real r.v 2 Assume E[X1] = m and Var(X1) = σ . Then

h ¯ i h 2i 2 E Xn = m, and E Sn = σ

Samy T. Gaussian vectors & CLT Probability Theory 79 / 86 ¯ 2 Law of (Xn, Sn ) in a Gaussian situation

Theorem 25.

2 Let X1, X2,. . . ,Xn i.i.d with common law N1(m, σ ). Then

1 ¯ 2 Xn and Sn are independent. 2 2 ¯ σ n−1 2 2 Xn ∼ N1(m, n ) and σ2 Sn ∼ χ (n − 1).

Samy T. Gaussian vectors & CLT Probability Theory 80 / 86 Proof (1)

(1) Reduction to m = 0 and σ = 1: we set

X − m X 0 = i ⇐⇒ X = σX 0 + m 1 ≤ i ≤ n. i σ i i

0 0 The r.v X1,..., Xn are i.i.d distributed as N1(0, 1) ¯ 0 02 ,→ empirical mean Xn, empirical variance Sn

Samy T. Gaussian vectors & CLT Probability Theory 81 / 86 Proof (2)

(1) Reduction to m = 0 and σ = 1 (ctd): ¯ 0 ¯ 0 It is easily seen (using Xi − Xn = σ(Xi − Xn)) that ¯ ¯ 0 2 2 02 Xn = σXn + m, and Sn = σ Sn .

Thus we are reduced to the case m = 0 and σ = 1 (2) Reduced case:

Consider X1,..., Xn i.i.d N (0, 1) ∗ −1/2 Let u1 = n (1, 1,..., 1) n We can construct u2,..., un such that (u1,..., un) onb of R n,n Let A ∈ R whose columns are u1,..., un We set Y = A∗X

Samy T. Gaussian vectors & CLT Probability Theory 82 / 86 Proof (3) (i) Expression for the empirical mean: A orthogonal matrix: AA∗ = A∗A = Id ,→ Y ∼ N (0, KY ) with

∗ ∗ ∗ ∗ ∗ KY = A KX (A ) = A Id A = A A = Id, because the covariance matrix KX of X is Id. Due to the fact that the first row of A∗ is 1 1 1 ! u∗ = √ , √ , , ..., √ , 1 n n n we have: 1 √ Y = √ (X + X + ... + X )= nX¯ , 1 n 1 2 n n

¯ √Y1 or otherwise stated, Xn = n

Samy T. Gaussian vectors & CLT Probability Theory 83 / 86 Proof (4)

(ii) Expression for the empirical variance: 2 Let us express Sn in terms of Y :

n n 2 X ¯ 2 X 2 ¯ ¯ 2 (n − 1)Sn = (Xk − Xn) = (Xk − 2Xk Xn + Xn ) k=1 k=1 n ! n ! X 2 ¯ X ¯ 2 = Xk − 2Xn Xk + nXn . k=1 k=1 As a consequence,

n ! n ! 2 X 2 ¯ ¯ ¯ 2 X 2 ¯ 2 (n − 1)Sn = Xk − 2Xn(nXn) + nXn = Xk − nXn . k=1 k=1

Samy T. Gaussian vectors & CLT Probability Theory 84 / 86 Proof (5)

(ii) Expression for the empirical variance (ctd): We have

∗ ∗ Pn 2 Pn 2 Y = A X, A orthogonal ⇒ k=1 Yk = k=1 Xk

Hence

n n n 2 X 2 ¯ 2 X 2 2 X 2 (n − 1)Sn = Xk − nXn = Yk − Y1 = Yk . k=1 k=1 k=2

Samy T. Gaussian vectors & CLT Probability Theory 85 / 86 Proof (6)

Summary: We have seen that

n ¯ Y1 2 X 2 Xn = √ , and (n − 1)Sn = Yk n k=2

Conclusion:

1 Y ∼ N (0, Idn) ⇒ Y1,..., Yn i.i.d N (0, 1) ¯ 2 ,→ independence of Xn and Sn .

2 ¯ √Y1 ¯ Furthermore, Xn = n ⇒ Xn ∼ N1(0, 1/n)

3 2 Pn 2 We also have (n − 1)Sn = k=2 Yk 2 2 ⇒ the law of (n − 1)Sn is χ (n − 1).

Samy T. Gaussian vectors & CLT Probability Theory 86 / 86