STAT 730 Chapter 3: Normal Distribution Theory

Home , Normal distribution, Wishart distribution

Timothy Hanson

Department of Statistics, University of South Carolina

Stat 730: Multivariate Analysis

1/36 Nice properties of multivariate normal random vectors

Multivariate normal easily generalizes univariate normal. Much harder to generalize Poisson, gamma, exponential, etc. Deﬁned completely by ﬁrst and second moments, i.e. mean vector and covariance matrix.

If x Np(µ, Σ), then σij = 0 implies xi independent of xj . ∼ a′x N(a′µ, a′Σa). ∼ Central Limit Theorem says sample means are approximately multivariate normal. Simple geometry makes properties intuitive.

2/36 Deﬁnition via Cram´er-Wold

x is multivariate normal a′x is normal for all a. ⇔ ′ ′ ′ p def’n x Np(µ, Σ) a x N(a µ, a Σa) for all a R . ∼ ⇔ ∼ ∈ thm: If x Np(µ, Σ) then its characteristic function is ∼ ′ 1 ′ φx(t) = exp(it µ t Σt). − 2 Proof: Let y = t′x. Then the c.f. of y is

def isy 1 2 ′ 1 2 ′ φy (s) = E e = exp isE(y) s var(y) = exp ist µ s t Σt . { } { − 2 } { − 2 } Then the c.f. of x is

′ def it x ′ 1 ′ φx(t) = E e = φy (1) = exp(it µ t Σt).2 { } − 2

Using the c.f. we see that if Σ = 0 then x = µ with probability one, i.e. Np(µ, 0)= δµ.

3/36 Linear transformations of x are also normal

q×p q thm: x Np(x, Σ), A R , and c R ∼ ∈ ′ ∈ Ax + c Nq(Aµ + c, AΣA ). ⇒ ∼ Proof : Let b Rq; then b′[Ax + c]=[b′A]x + b′c. Since [b′A]x ∈ is univariate normal by def’n, [b′A]x + b′c is also for any b. The speciﬁc forms for the mean and covariance are standard results for any Ax + c (Chapter 2). 2

Corollary: Any subset of x is multivariate normal; the xi are normal.

itµ−σ2t2/2 2 Note: you will show φy (t)= e for y N(µ,σ ) in your ∼ HW.

4/36 Normality and independence

′ ′ ′ Let x Np(µ, Σ) and x =(x1, x2) of dimension k and p k. ∼ Σ Σ − Also partition µ′ =(µ′ , µ′ ) and Σ = 11 12 . Then x 1 2 Σ Σ 1 21 22 indep. x C(x , x )= Σ = Σ′ = 0. 2 ⇔ 1 2 12 21 Proof :

′ ′ 1 ′ 1 ′ φx(t) = φx (t )φx (t ) = exp(it µ + t µ t Σ t t Σ t ) 1 1 2 2 1 1 2 2 − 2 1 11 1 − 2 2 22 2 C(x , x )= 0.2 ⇔ 1 2

5/36 Some results based on last two slides

−1/2 Corollary: x Np(µ, Σ) y = Σ (x µ) Np(0, In) and ∼ ⇒ − ∼ U =(x µ)′Σ−1(x µ)= y′y χ2. − − ∼ p a′x Corollary: x Np(0, I) N(0, 1) for a = 0. ∼ ⇒ ||a|| ∼ 6 n ×p n ×p thm: Let A R 1 , B R 2 , and x Np(µ, Σ). Then Ax ∈ ∈ ∼ indep. Bx AΣB′ = 0. ⇔ Last one is immediate from previous two slides by ﬁnding the A distribution of x. B 2 ′ 2 Corollary: x Np(µ,σ I) and GG = I then Gx Np(Gµ,σ I). ∼ ∼ Also Gx indep. of (I G′G)x. −

6/36 Conditional distribution of x x 2| 1

′ ′ ′ Let x Np(µ, Σ) and x =(x1, x2) of dimension k and p k. ∼ Σ Σ − Also partition µ′ =(µ′ , µ′ ) and Σ = 11 12 . Let 1 2 Σ Σ 21 22 x = x Σ Σ−1x . 2.1 2 − 21 11 1 I x1 k 0 = −1 I x x2.1 −Σ21Σ11 p−k

µ1 Σ11 0 ∼ Np −1 , −1 . µ2 − Σ21Σ11 µ1 0 Σ22 − Σ21Σ11 Σ12

So x indep. x . Then x x = x + Σ Σ−1x has 1 2.1 2| 1 2.1 21 11 1 constant distribution... −1 | {z } −1 thm: x x Np k (µ + Σ Σ (x µ ), Σ Σ Σ Σ ). 2| 1 ∼ − 2 21 11 1 − 1 22 − 21 11 12 Very useful! Mean and variance results hold for non-normal x too.

7/36 Transformations of normal data matrix

iid ′ If x ,..., xn Np(µ, Σ), then X =[x xn] is a n p “normal 1 ∼ 1 ··· × data matrix.” General transformations are of the form AXB. An important ′ 1 ′ I example is ¯x =[ n 1n]X[ ], the sample mean. One can show via c.f. that... iid 1 thm: x ,..., xn Np(µ, Σ) ¯x Np(µ, Σ). 1 ∼ ⇒ ∼ n

8/36 General transformation theorem

thm: If X(n p) is data matrix from Np(µ, Σ) and × Y(m q)= AXB then Y is normal data matrix × ⇔ ′ (a) A1n = α1m for α R, or B µ = 0, and ∈ ′ ′ (b) AA = βIp some β R, or B ΣB = 0. ∈ We will prove this in class. Some necessary results follow. def’n: For any matrix X Rn×p, let ∈ x(1) Xv = . =(x′ ,..., x′ )′ Rnp.  .  (1) (p) ∈ x  (p)   

9/36 Kronecker products

def’n Let A Rn×m and B Rp×q. Then ∈ ∈

a11B a12B a1mB a B a B ··· a B 21 22 2m np×mq A B =  . . ···. .  R . ⊗ . . .. . ∈    an B an B anmB   1 2 ···    iid Let x ,..., xn Np(µ, Σ). Then C(xi , xj )= δij Σ, so 1 ∼

x1 µ Σ 0 0 x µ 0 Σ ··· 0 2 I  .  Nnp  .  ,  . . ···. .  = Nnp(1n µ, n Σ). . ∼ ...... ⊗ ⊗        xn   µ   0 0 Σ       ···       

10/36 Kronecker products, dist’n of Xv

iid prop: Let x ,..., xn Np(µ, Σ). Then 1 ∼

x(1) µ11n σ11In σ12In · · · σ1pIn I I I x(2) µ21n σ21 n σ22 n · · · σ2p n v       X = . ∼ Nnp . , . . . .  .   .   . . .. .   .   .   . . .   x   µ 1   σ I σ I · · · σ I   (p)   p n   p1 n p2 n pp n  = Nnp(µ ⊗ 1n, Σ ⊗ In).

I This is immediate from C(x(i), x(j))= σij n and E(x(j))= µj 1n and the fact that Xv is a permutation matrix times the vector on the previous slide (so it’s also normal).

Corollary: X(n p) is n.d.m. from Np(µ, Σ) v × ⇔ X Nnp(µ 1n, Σ In). ∼ ⊗ ⊗

11/36 Kronecker products, VIII on p. 460

prop: (B′ A)Xv =(AXB)v . ⊗ Proof : First note that

p b11A b21A · · · bp1A x(1) i bi1Ax(i) p=1 b12A b22A · · · bp2A x(2) P bi2Ax(i) ′ v      i=1  (B ⊗ A)X = . . . . . = P . .  . . .. .   .   .   . . .   .   .   b A b A · · · b A   x   p b Ax   1q 2q pq   (p)   i=1 iq (i)  P

Now let’s ﬁnd the jth column of Am×nXn×pBp×q. For any Aa×bBb×c the jth column of AB is Ab(j). First AXB =[Ax(1) Ax(p)]B. Thus the jth column of AXB is ··· p [Ax Ax ]b = bij Ax . 2 (1) ··· (p) (j) i=1 (i) P

12/36 Proof of theorem

′ v ′ ′ ′ ′ (B A)X Nmq([B A][µ 1n], [B A][Σ In][B A] ). ⊗ ∼ ⊗ ⊗ ⊗ ⊗ ⊗ ′ ′ ′ B µ⊗A1n B ΣB⊗AA

This uses [A B][C | D]={zAC BD} |and [A B{z]′ = A′ B′.} ⊗ ⊗ ⊗ ⊗ ⊗ Go back to the theorem, this implies it.

′ ′ In particular, if Y = XB then Y is d.m. from Nq(B µ, B ΣB), as A = In.

13/36 Important later on in this Chapter...

thm: X is d.m. from Np(µ, Σ), Y = AXB, Z = CXD, then Y indep. of Z either (a) B′ΣD = 0 or (b) AC′ = 0. ⇔ You will prove this in your homework, see 3.3.5 (p.88). Corollary: Let X =[X X ] of dimensions n k and n (p k). 1 2 × × − Then X indep. X = X X Σ−1Σ , X d.m. from 1 2.1 2 − 1 11 12 1 Nk (µ1, Σ11) and X2.1 d.m. from Np−k (µ2.1, Σ22.1) where µ = µ Σ Σ−1µ and Σ = Σ Σ Σ−1Σ . 2.1 2 − 21 11 1 22.1 22 − 21 11 12 ′ Proof : X1 = XB where B =[Ik 0] and X2.1 = XD where ′ −1 D =[ Σ Σ Ip k ]. Now use above theorem. 2. − 21 11 − Corollary: ¯x indep. S.

1 ′ 1 ′ Proof : Taking A = 1 and C = H = In 1n1 in the n n − n n theorem gives ¯x indep. HX. 2.

14/36 Wishart distribution

′ 1 ′ Note that S = X [ n H]X. Quadratic functions of the form X CX are an ingredient in many multivariate test statistics.

′ def’n: M(p p)= X X where X(m p) is a d.m. from Np(0, Σ) × × has a Wishart distribution with scale matrix Σ and d.f. m. Shorthand: M Wp(Σ, m). ∼ ′ ′ m Note that the ijth element of X X is simply x(i)x(j) = k=1 xki xkj . The ijth element of x x′ is x x . Therefore X′X = m x x′ . k k ki kj Pk=1 k k m P ′ m Then E(M)= E xk xk = k=1 Σ = mΣ. "k=1 # X P E(xk )=0 | {z }

15/36 Quadratic form involving Wishart

p×q thm: Let B R and M Wp(Σ, m). Then ′ ∈ ′ ∼ B MB Wq(B ΣB, m). ∼ Proof : Let Y = XB. Result 3 slides back gives us Y is d.m. from ′ Nq(0, B ΣB). Then def’n Wishart tells us ′ ′ ′ ′ ′ Y Y = B X XB = B MB Wq(B ΣB, m). 2 ∼

16/36 Simple results that follow this theorem

Corollary: Diagonal submatrices of M (square matrices that share part of a diagonal with M) have a Wishart distribution.

2 Corollary: mii χ σii . ∼ m −1/2 −1/2 Corollary: Σ MΣ Wp(Ip, m). ∼ ′ Corollary: M Wp(Ip, m) and B(p q) s.t. B B = Iq then ′ ∼ × B MB Wq(Iq, m). ∼ ′ ′ a Ma 2 Corollary: M Wp(Σ, m) and a s.t. a Σa = 0 ′ χ . ∼ 6 ⇒ a Σa ∼ m All use diﬀerent B in the theorem on the previous slide plus minor manipulation.

17/36 Wisharts add!

thm M Wp(Σ, m ) indep. M Wp(Σ, m ) 1 ∼ 1 2 ∼ 2 ⇒ M + M Wp(Σ, m + m ). 1 2 ∼ 1 2 X1 ′ Proof : Let X = . Then M1 + M2 = X X. Now use the X2 def’n of Wishart. 2

We are just adding m2 more independent rows onto X1.

18/36 Cochran’s theorem

thm: If X(n p) d.m. from Np(0, Σ) and C(n n) is symmetric × × w/ eigenvalues λ1,...,λn then ′ D n iid (a) X CX = λi Mi where M ,..., Mn Wp(Σ, 1). i=1 1 ∼ ′ (b) X CX Wp(Σ, r) C idempotent where r = trC = rankC. ∼ P ⇔ (c) nS Wp(Σ, n 1). ∼ − Proof The spectral decomposition of C is ′ n ′ C =[γ1 γn]Λ[γ1 γn] = i=1 λi γi γi . Then ′ ···n ′ ··· ′ ′ n ′ ′ ′ X CX = i=1 λi [X γi ][X γi ] = i=1 λi [γi X] [γi X]. General P′ ′ transformation theorem (A = γ & B = Ip) tells us that γ X is P i P i d.m. from Np(0, Σ) so (a) follows from def’n Wishart. Part (b): C idempotent there are r λi = 1 and n r λi = 0, hence ⇒ − tr C = λ + λn = r. Now use part (a). For part (c) note that H 1 ··· is idempotent and rank n 1. 2 − This is a biggie. Lots of stuﬀ that will be used later.

19/36 Drum roll...

iid If x ,..., xn Np(µ, Σ) then 1 ∼ 1 ¯x Np(µ, Σ), ∼ n

nS Wp(Σ, n 1), ∼ − and ¯x indep. of S. This is a generalization of the univariate p = 1 case where 2 x¯ N(µ, σ ) indep. of ns2 σ2χ2 . This latter result is used to ∼ n ∼ n−1 cook up a tn−1 distribution: x¯ µ − tn−1, s2/n ∼

by def’n. We’ll shortly generalizep this to p dimensions, but ﬁrst one last result.

20/36 Generalization of partitioning sums of squares

Here is Craig’s theorem.

thm X d.m. from Np(µ, Σ) and C1,..., Ck are symmetric, then ′ ′ X C X,..., X Ck X are indep. if Cr Cs = 0 for all r = s. 1 6 Proof : Let’s do it for two projection matrices. Write ′ ′ ′ ′ ′ ′ X C1X = X M1Λ1Λ1M1X and X C2X = X M2Λ2Λ2M2X. Note that Λi Λi = Λi as the e-values are either 1 or 0. Theorem (slide ′ ′ 14) says Λ1M1X indep. Λ2M2X ′ ′ ′ ′ ⇔ [Λ1M1][Λ2M2] = Λ1M1M2Λ2 = 0. But 0 = C C = M Λ M′ M Λ M′ Λ M′ M Λ = 0. 2 1 2 1 1 1 2 2 2 ⇒ 1 1 2 2 This will come in handy in ﬁnding the sampling distribution of common test statistics under H0.

21/36 Hotelling’s T 2

N(0,1) t Recall, using obvious notation, 2 ν. Used for one and √χν /ν ∼ two-sample t tests for univariate outcomes. We’ll now generalize this distribution.

def’n: Let d Np(0, Ip) indep. M Wp(Ip, m). Then ∼ ∼ md′M−1d T 2(p, m). ∼ thm: Let x Np(µ, Σ) indep. M Wp(Σ, m). Then ∼ ∼ m(x µ)′M−1(x µ) T 2(p, m). − − ∼ Proof : Take d∗ = Σ−1/2(x µ) and M∗ = Σ−1/2MΣ−1/2 and − use def’n of T 2. 2 Corollary: ¯x and S are sample mean and covariance from iid x ,..., xn Np(µ, Σ) 1 ∼ ⇒ (n 1)(¯x µ)′S−1(¯x µ) T 2(p, n 1). − − − ∼ − Proof : Substitute M = nS, m = n 1, and x µ for √n(¯x µ) − − − in the theorem above. 2 22/36 Hotelling’s T 2 is a scaled F

2 mp thm: T (p, m)= m−p+1 Fp,m−p+1.

To prove this we need some ingredients...

M11 M12 Let M Wp(Σ, m) and take M = where ∼ M21 M22 M Ra×a and M Rb×b and a + b = p. Further, let 11 ∈ 22 ∈ M = M M M−1M . 22.1 22 − 21 11 12

23/36 Proof, Hotelling’s T 2 is a scaled F

thm: Let M Wp(Σ, m) where m > p. Then ∼ M Wb(Σ , m a). 22.1 ∼ 22.1 − Proof : Let X =[X1X2], so

M M X′ X X′ X M = 11 12 = X′X = 1 1 1 2 . M M X′ X X′ X 21 22 2 1 2 2 Then

M = X′ X X′ X (X′ X )−1X X = X′ PX = X′ PX , 22.1 2 2 − 2 1 1 1 1 2 2 2 2.1 2.1 ′ −1 ⊥ where P = In X (X X ) X is o.p. matrix onto (X ) and − 1 1 1 1 C 1 X X = X X Σ−1Σ . Theorem on slide 14 tells us X is 2.1| 1 2 − 1 11 12 2.1 d.m. from Nb(0, Σ22.1) (not dim. p as in the book). So Cochran’s theorem tells us M X Wb(Σ , m a). This doesn’t 22.1| 1 ∼ 22.1 − depend on X1 so it’s the marginal dist’n as well. 2

24/36 Proof, Hotelling’s T 2 is a scaled F

1 1 2 lemma: If M Wp(Σ, m), m > p then −1 −1 χm−p−1. ∼ [M ]pp ∼ [Σ ]pp Proof : In general, for partitioned matrices,

−1 − −1 −1 − −1 − −1 −1 M11 M12 (M11 M12M22 M21) M11 M12(M22 M21M11 M12) = −1 −1 − −1 − . M21 M22 −M M (M − M M M ) 1 (M − M M M ) 1 " 22 21 11 12 22 21 22 21 11 12 # Now let M be upper left (p 1) (p 1) submatrix of M and 11 − × − m22 the lower right 1 1 “scalar matrix.” Then, where 1 × σ22.1 = −1 , [Σ ]pp 1 1 2 2 −1 = = m22.1 W1(σ22.1, m (p 1)) = σ22.1χm p . [M ]pp 1/m22.1 ∼ − − − −1 ′ − a Σ 1a 2 thm: If M Wp(Σ, m), m > p then ′ − χ . ∼ a M 1a ∼ m−p+1

Proof : Let A =[a(1) a(p−1)a]. Then −1 −1 ′ ··· −1 −1 ′ N = A M(A ) Wp(A Σ(A ) , m). So ∼ 1 1 1 1 2 −1 = −1 ′ = ′ −1 ′ −1 χm p . [N ]pp [AM A ]pp a M a ∼ a Σ a − +1 −1 −1 ′ −1 1 2 Noting that the ppth element of [A Σ(A ) ] is a′Σa . 25/36 Proof, Hotelling’s T 2 is a scaled F

′ −1 2 Recall md M d T (p, m) where d Np(0, Ip) indep. of ′ ∼ d d ∼ 2 M Wp(Ip, m). Given d, β = ′ − χ (last slide). ∼ d M 1d ∼ m−p+1 Since this is independent of d, β indep. d and this is the marginal dist’n as well.

′ 2 ′ −1 md d χp mp 2 md M d = ′ ′ −1 = m 2 = m p Fp,m−p+1. d d/d M d χm−p+1 − +1

Corollary: ¯x and S are sample mean and covariance from Np(µ, Σ) n−p ′ −1 then (¯x µ) S (¯x µ) Fp n p. p − − ∼ , −

26/36 Two more distributional results

Corollary: M / M + dd′ B( 1 (m p + 1), p ). | | | |∼ 2 − 2

Proof : For B(p n) and C(n p), Ip + BC = In + CB . × × | | | | Since AB = A B , we can write this as 1| | | || |1 1 1 I −1 ′ = I ′ −1 = ′ −1 = p . Recall | p+M dd | | 1+d M d| 1+d M d F − 1+ m−p+1 p,m p+1 ν1x/ν2 ν1 ν2 1 ν2 ν1 if x Fν1,ν2 then 1+ν x/ν B( 2 , 2 ) and 1+ν x/ν B( 2 , 2 ). 2 ∼ 1 2 ∼ 1 2 ∼

Corollary: d Np(0, Ip) indep. M Wp(Ip, m) then ∼ ∼ d′d(1+1/ d′M−1d ) χ2 indep. of d′M−1d. { } ∼ m+1 Proof: β indep. d′d (last slide); both χ2 so their sum is indep. of their ratio. Sum of two indep. χ2 is also χ2; the d.f. add. 2

27/36 Two-sample Hotelling T 2

2 ′ −1 Let D =(¯x1 ¯x2) Su (¯x1 ¯x2) estimate 2 − ′ −1 − 1 ∆ =(µ µ ) Σ (µ µ ) where Su = [n S + n S ]. 1 − 2 1 − 2 n−2 1 1 2 2 Then

thm: Let X1 d.m. from Np(µ1, Σ1) indep. X2 d.m. from Np(µ2, Σ2). If µ1 = µ2 and Σ1 = Σ2 then n1n2 D2 T 2(p, n 2). n1+n2 ∼ − 1 1 Proof : d = ¯x ¯x Np(µ µ , Σ + Σ ). When 1 2 1 2 n1 1 n2 2 − ∼ − n1+n2 µ = µ and Σ1 = Σ2, d Np(0, cΣ) where c = . Also 1 2 ∼ n1n2 M = n S + n S Wp(Σ, n + n 2) as independent Wisharts 1 1 2 2 ∼ 1 2 − w/ same scale add; cM Wp(cΣ, n + n 2). M indep. d as ∼ 1 2 − ¯x1, ¯x2, S1, S2 mutually indep. Stirring all ingredients together gives 2 D =(n 2)d′(cM)−1d T 2(p, n 2). 2. c − ∼ −

28/36 Generalization of F statistic

We’ve already generalized the t for multivariate data; now it’s time for the F .

−1 Let A Wp(Σ, m) indep. of B Wp(Σ, n) where m p. A ∼ ∼ ≥ exists a.s. and we will examine aspects of A−1B. Note that this reduces to the ratio of indep, χ2 in the univariate p = 1 case.

29/36 Generalization of F statistic

p−1 2 lemma: M Wp(Σ, m), m p, M = Σ χ . ∼ ≥ ⇒| | | | i=0 m−i Proof : By induction. For p = 1 M = m σQ2χ2 . For p > 1 let | | ∼ m M be upper left (p 1) (p 1) submatrix of M and m the 11 − × − 22 lower right 1 1 “scalar matrix” (slide 24). The induction × p−2 hypothesis says M = Σ χ2 . Slide 24 implies that | 11| | 11| i=0 m−i m indep. M and m σ χ2 . The result follows by 22.1 11 22.1 ∼Q22.1 m−p+1 noting that M = M m and Σ = Σ σ (p. 457 or | | | 11| 22.1 | | | 11| 22.1 expansion of determinant using cofactors). 2

thm: Let A Wp(Σ, m) indep. of B Wp(Σ, n) where m p ∼ p ∼ ≥ and n p. Then φ = B / A Fn i m i . ≥ | | | |∝ i=1 − +1, − +1 Proof : Using the lemma Q 2 p χn−i p n−i 2 φ = i=0 2 = i=0 m i Fn−i+1,m−i+1. χm−i − Q Q

30/36 Wilk’s lambda

Wilk’s lambda, a generalization of the beta variable, appears later on when performing LRT:

def’n: A Wp(Ip, m) indep. B Wp(Ip, n) and m p ∼ ∼ ≥ Λ= A / A + B Λ(p, m, n), | | | |∼ has a Wilk’s lambda distribution with parameters (p, m, n) n thm: Λ i=1 ui where u1,..., un are mutually independent and ∼1 p ui B( (m + i p), ). ∼ 2 Q − 2

31/36 Proof Wilk’s lambda in product of betas

′ Let X(n p) be d.m. from Np(0, Ip), B = X X and Xi be ﬁrst i × ′ rows of X. Let Mi = A + Xi Xi . Then M0 = A, Mn = A + B, and ′ Mi = Mi−1 + xi xi . Then

32/36 Characterization of independence of matrix & vector

lemma: Let W Rp×p and x Rp. If x is indep. of ′ ′∈ ∈ ′ (g Wg ,..., g Wgp) for all orthogonal G =[g gp] then x 1 1 p 1 ··· indep. W. I {i

33/36 Showing independence of u1,..., un, continued...

thm: d Np(0, Ip) indep. M Wp(Ip, m) then ′ −1 ∼ p ∼ ′ d M d Fp m p indep. M + dd Wp(Ip, m + 1). ∼ m−p+1 , − +1 ∼ ′ Proof : Let G =[g gp] orthogonal matrix and take 1 ··· X1 ′ X((m + 1) p)= . Here M = X X and d = xm . Let × x′ 1 1 +1 m+1 ′ Y = XG =[Xg Xgp]=[Y ... Y ]. 1 ··· (1) (p) ′ ′ ′ ′ 2 Then qj = gj [M + dd ]gj = gj X Xgj = Y(j) . Since v v || || Y Nnp(0, Inp), Y is spherically symmetric. Deﬁne ∼ ′ ′ −1 ′ −1 h(Y)= ym+1(Y Y) ym+1 = d M d and note that h(Y)= h(YD) for all diagonal D. Theorem on p. 48 implies qj indep. h(Y) for j = 1,..., p. Now use lemma on previous slide. 2

34/36 Showing independence of u1,..., un, continued...

1 ′ −1 Theorem on last slide implies = Mi / Mi−1 =1+ x M xi ui | | | | i i−1 indep. Mi . Finally,

j ′ Mi+j = Mi + xi+k xi+k , k=1 X ui indep. of | {z } so for any i, ui indep. of ui+1,..., un. 2

35/36 Two last results...

When m is large, can also use Bartlett’s approximation:

m 1 (p n + 1) log Λ(p, m, n) • χ2 . −{ − 2 − } ∼ np

def’n: A Wp(Ip, m) indep. B Wp(Ip, n) and m p. ∼ ∼ ≥ θ(p, m, n), the largest eigenvalue of (A + B)−1B is called the greatest root statistic with parameters (p, m, n). MKB (p. 84) gives relationships between Λ(p, m, n) and θ(p, m, n)

36/36