<<

STAT 730 Chapter 3: Theory

Timothy Hanson

Department of , University of South Carolina

Stat 730: Multivariate Analysis

1/36 Nice properties of multivariate normal random vectors

Multivariate normal easily generalizes normal. Much harder to generalize Poisson, gamma, exponential, etc. Defined completely by first and second moments, i.e. vector and covariance .

If x Np(µ, Σ), then σij = 0 implies xi independent of xj . ∼ a′x N(a′µ, a′Σa). ∼ Central Theorem says sample are approximately multivariate normal. Simple geometry makes properties intuitive.

2/36 Definition via Cram´er-Wold

x is multivariate normal a′x is normal for all a. ⇔ ′ ′ ′ p def’n x Np(µ, Σ) a x N(a µ, a Σa) for all a . ∼ ⇔ ∼ ∈ thm: If x Np(µ, Σ) then its characteristic is ∼ ′ 1 ′ φx(t) = exp(it µ t Σt). − 2 Proof: Let y = t′x. Then the c.f. of y is

def isy 1 2 ′ 1 2 ′ φy (s) = E e = exp isE(y) s var(y) = exp ist µ s t Σt . { } { − 2 } { − 2 } Then the c.f. of x is

′ def it x ′ 1 ′ φx(t) = E e = φy (1) = exp(it µ t Σt).2 { } − 2

Using the c.f. we see that if Σ = 0 then x = µ with one, i.e. Np(µ, 0)= δµ.

3/36 Linear transformations of x are also normal

q×p q thm: x Np(x, Σ), A R , and c R ∼ ∈ ′ ∈ Ax + c Nq(Aµ + c, AΣA ). ⇒ ∼ Proof : Let b Rq; then b′[Ax + c]=[b′A]x + b′c. Since [b′A]x ∈ is univariate normal by def’n, [b′A]x + b′c is also for any b. The specific forms for the mean and covariance are standard results for any Ax + c (Chapter 2). 2

Corollary: Any subset of x is multivariate normal; the xi are normal.

itµ−σ2t2/2 2 Note: you will show φy (t)= e for y N(µ,σ ) in your ∼ HW.

4/36 and independence

′ ′ ′ Let x Np(µ, Σ) and x =(x1, x2) of dimension k and p k. ∼ Σ Σ − Also partition µ′ =(µ′ , µ′ ) and Σ = 11 12 . Then x 1 2 Σ Σ 1  21 22  indep. x C(x , x )= Σ = Σ′ = 0. 2 ⇔ 1 2 12 21 Proof :

′ ′ 1 ′ 1 ′ φx(t) = φx (t )φx (t ) = exp(it µ + t µ t Σ t t Σ t ) 1 1 2 2 1 1 2 2 − 2 1 11 1 − 2 2 22 2 C(x , x )= 0.2 ⇔ 1 2

5/36 Some results based on last two slides

−1/2 Corollary: x Np(µ, Σ) y = Σ (x µ) Np(0, In) and ∼ ⇒ − ∼ U =(x µ)′Σ−1(x µ)= y′y χ2. − − ∼ p a′x Corollary: x Np(0, I) N(0, 1) for a = 0. ∼ ⇒ ||a|| ∼ 6 n ×p n ×p thm: Let A R 1 , B R 2 , and x Np(µ, Σ). Then Ax ∈ ∈ ∼ indep. Bx AΣB′ = 0. ⇔ Last one is immediate from previous two slides by finding the A distribution of x. B   2 ′ 2 Corollary: x Np(µ,σ I) and GG = I then Gx Np(Gµ,σ I). ∼ ∼ Also Gx indep. of (I G′G)x. −

6/36 Conditional distribution of x x 2| 1

′ ′ ′ Let x Np(µ, Σ) and x =(x1, x2) of dimension k and p k. ∼ Σ Σ − Also partition µ′ =(µ′ , µ′ ) and Σ = 11 12 . Let 1 2 Σ Σ  21 22  x = x Σ Σ−1x . 2.1 2 − 21 11 1 I x1 k 0 = −1 I x  x2.1   −Σ21Σ11 p−k 

µ1 Σ11 0 ∼ Np −1 , −1 .  µ2 − Σ21Σ11 µ1   0 Σ22 − Σ21Σ11 Σ12 

So x indep. x . Then x x = x + Σ Σ−1x has 1 2.1 2| 1 2.1 21 11 1 constant distribution... −1 | {z } −1 thm: x x Np k (µ + Σ Σ (x µ ), Σ Σ Σ Σ ). 2| 1 ∼ − 2 21 11 1 − 1 22 − 21 11 12 Very useful! Mean and results hold for non-normal x too.

7/36 Transformations of normal data matrix

iid ′ If x ,..., xn Np(µ, Σ), then X =[x xn] is a n p “normal 1 ∼ 1 ··· × data matrix.” General transformations are of the form AXB. An important ′ 1 ′ I example is ¯x =[ n 1n]X[ ], the sample mean. One can show via c.f. that... iid 1 thm: x ,..., xn Np(µ, Σ) ¯x Np(µ, Σ). 1 ∼ ⇒ ∼ n

8/36 General transformation theorem

thm: If X(n p) is data matrix from Np(µ, Σ) and × Y(m q)= AXB then Y is normal data matrix × ⇔ ′ (a) A1n = α1m for α R, or B µ = 0, and ∈ ′ ′ (b) AA = βIp some β R, or B ΣB = 0. ∈ We will prove this in class. Some necessary results follow. def’n: For any matrix X Rn×p, let ∈ x(1) Xv = . =(x′ ,..., x′ )′ Rnp.  .  (1) (p) ∈ x  (p)   

9/36 Kronecker products

def’n Let A Rn×m and B Rp×q. Then ∈ ∈

a11B a12B a1mB a B a B ··· a B 21 22 2m np×mq A B =  . . ···. .  R . ⊗ . . .. . ∈    an B an B anmB   1 2 ···    iid Let x ,..., xn Np(µ, Σ). Then C(xi , xj )= δij Σ, so 1 ∼

x1 µ Σ 0 0 x µ 0 Σ ··· 0 2 I  .  Nnp  .  ,  . . ···. .  = Nnp(1n µ, n Σ). . ∼ ...... ⊗ ⊗        xn   µ   0 0 Σ       ···       

10/36 Kronecker products, dist’n of Xv

iid prop: Let x ,..., xn Np(µ, Σ). Then 1 ∼

x(1) µ11n σ11In σ12In · · · σ1pIn I I I x(2) µ21n σ21 n σ22 n · · · σ2p n v       X = . ∼ Nnp . , . . . .  .   .   . . .. .   .   .   . . .   x   µ 1   σ I σ I · · · σ I   (p)   p n   p1 n p2 n pp n  = Nnp(µ ⊗ 1n, Σ ⊗ In).

I This is immediate from C(x(i), x(j))= σij n and E(x(j))= µj 1n and the fact that Xv is a permutation matrix times the vector on the previous slide (so it’s also normal).

Corollary: X(n p) is n.d.m. from Np(µ, Σ) v × ⇔ X Nnp(µ 1n, Σ In). ∼ ⊗ ⊗

11/36 Kronecker products, VIII on p. 460

prop: (B′ A)Xv =(AXB)v . ⊗ Proof : First note that

p b11A b21A · · · bp1A x(1) i bi1Ax(i) p=1 b12A b22A · · · bp2A x(2) P bi2Ax(i) ′ v      i=1  (B ⊗ A)X = . . . . . = P . .  . . .. .   .   .   . . .   .   .   b A b A · · · b A   x   p b Ax   1q 2q pq   (p)   i=1 iq (i)  P

Now let’s find the jth column of Am×nXn×pBp×q. For any Aa×bBb×c the jth column of AB is Ab(j). First AXB =[Ax(1) Ax(p)]B. Thus the jth column of AXB is ··· p [Ax Ax ]b = bij Ax . 2 (1) ··· (p) (j) i=1 (i) P

12/36 Proof of theorem

′ v ′ ′ ′ ′ (B A)X Nmq([B A][µ 1n], [B A][Σ In][B A] ). ⊗ ∼ ⊗ ⊗ ⊗ ⊗ ⊗ ′ ′ ′ B µ⊗A1n B ΣB⊗AA

This uses [A B][C | D]={zAC BD} |and [A B{z]′ = A′ B′.} ⊗ ⊗ ⊗ ⊗ ⊗ Go back to the theorem, this implies it.

′ ′ In particular, if Y = XB then Y is d.m. from Nq(B µ, B ΣB), as A = In.

13/36 Important later on in this Chapter...

thm: X is d.m. from Np(µ, Σ), Y = AXB, Z = CXD, then Y indep. of Z either (a) B′ΣD = 0 or (b) AC′ = 0. ⇔ You will prove this in your homework, see 3.3.5 (p.88). Corollary: Let X =[X X ] of dimensions n k and n (p k). 1 2 × × − Then X indep. X = X X Σ−1Σ , X d.m. from 1 2.1 2 − 1 11 12 1 Nk (µ1, Σ11) and X2.1 d.m. from Np−k (µ2.1, Σ22.1) where µ = µ Σ Σ−1µ and Σ = Σ Σ Σ−1Σ . 2.1 2 − 21 11 1 22.1 22 − 21 11 12 ′ Proof : X1 = XB where B =[Ik 0] and X2.1 = XD where ′ −1 D =[ Σ Σ Ip k ]. Now use above theorem. 2. − 21 11 − Corollary: ¯x indep. S.

1 ′ 1 ′ Proof : Taking A = 1 and C = H = In 1n1 in the n n − n n theorem gives ¯x indep. HX. 2.

14/36

′ 1 ′ Note that S = X [ n H]X. Quadratic functions of the form X CX are an ingredient in many multivariate test statistics.

′ def’n: M(p p)= X X where X(m p) is a d.m. from Np(0, Σ) × × has a Wishart distribution with scale matrix Σ and d.f. m. Shorthand: M Wp(Σ, m). ∼ ′ ′ m Note that the ijth element of X X is simply x(i)x(j) = k=1 xki xkj . The ijth element of x x′ is x x . Therefore X′X = m x x′ . k k ki kj Pk=1 k k m P ′ m Then E(M)= E xk xk = k=1 Σ = mΣ. "k=1 # X P E(xk )=0 | {z }

15/36 involving Wishart

p×q thm: Let B R and M Wp(Σ, m). Then ′ ∈ ′ ∼ B MB Wq(B ΣB, m). ∼ Proof : Let Y = XB. Result 3 slides back gives us Y is d.m. from ′ Nq(0, B ΣB). Then def’n Wishart tells us ′ ′ ′ ′ ′ Y Y = B X XB = B MB Wq(B ΣB, m). 2 ∼

16/36 Simple results that follow this theorem

Corollary: Diagonal submatrices of M (square matrices that share part of a diagonal with M) have a Wishart distribution.

2 Corollary: mii χ σii . ∼ m −1/2 −1/2 Corollary: Σ MΣ Wp(Ip, m). ∼ ′ Corollary: M Wp(Ip, m) and B(p q) s.t. B B = Iq then ′ ∼ × B MB Wq(Iq, m). ∼ ′ ′ a Ma 2 Corollary: M Wp(Σ, m) and a s.t. a Σa = 0 ′ χ . ∼ 6 ⇒ a Σa ∼ m All use different B in the theorem on the previous slide plus minor manipulation.

17/36 Wisharts add!

thm M Wp(Σ, m ) indep. M Wp(Σ, m ) 1 ∼ 1 2 ∼ 2 ⇒ M + M Wp(Σ, m + m ). 1 2 ∼ 1 2 X1 ′ Proof : Let X = . Then M1 + M2 = X X. Now use the X2 def’n of Wishart. 2 

We are just adding m2 more independent rows onto X1.

18/36 Cochran’s theorem

thm: If X(n p) d.m. from Np(0, Σ) and C(n n) is symmetric × × w/ eigenvalues λ1,...,λn then ′ D n iid (a) X CX = λi Mi where M ,..., Mn Wp(Σ, 1). i=1 1 ∼ ′ (b) X CX Wp(Σ, r) C idempotent where r = trC = rankC. ∼ P ⇔ (c) nS Wp(Σ, n 1). ∼ − Proof The spectral decomposition of C is ′ n ′ C =[γ1 γn]Λ[γ1 γn] = i=1 λi γi γi . Then ′ ···n ′ ··· ′ ′ n ′ ′ ′ X CX = i=1 λi [X γi ][X γi ] = i=1 λi [γi X] [γi X]. General P′ ′ transformation theorem (A = γ & B = Ip) tells us that γ X is P i P i d.m. from Np(0, Σ) so (a) follows from def’n Wishart. Part (b): C idempotent there are r λi = 1 and n r λi = 0, hence ⇒ − tr C = λ + λn = r. Now use part (a). For part (c) note that H 1 ··· is idempotent and rank n 1. 2 − This is a biggie. Lots of stuff that will be used later.

19/36 Drum roll...

iid If x ,..., xn Np(µ, Σ) then 1 ∼ 1 ¯x Np(µ, Σ), ∼ n

nS Wp(Σ, n 1), ∼ − and ¯x indep. of S. This is a generalization of the univariate p = 1 case where 2 x¯ N(µ, σ ) indep. of ns2 σ2χ2 . This latter result is used to ∼ n ∼ n−1 cook up a tn−1 distribution: x¯ µ − tn−1, s2/n ∼

by def’n. We’ll shortly generalizep this to p dimensions, but first one last result.

20/36 Generalization of partitioning sums of squares

Here is Craig’s theorem.

thm X d.m. from Np(µ, Σ) and C1,..., Ck are symmetric, then ′ ′ X C X,..., X Ck X are indep. if Cr Cs = 0 for all r = s. 1 6 Proof : Let’s do it for two projection matrices. Write ′ ′ ′ ′ ′ ′ X C1X = X M1Λ1Λ1M1X and X C2X = X M2Λ2Λ2M2X. Note that Λi Λi = Λi as the e-values are either 1 or 0. Theorem (slide ′ ′ 14) says Λ1M1X indep. Λ2M2X ′ ′ ′ ′ ⇔ [Λ1M1][Λ2M2] = Λ1M1M2Λ2 = 0. But 0 = C C = M Λ M′ M Λ M′ Λ M′ M Λ = 0. 2 1 2 1 1 1 2 2 2 ⇒ 1 1 2 2 This will come in handy in finding the of common test statistics under H0.

21/36 Hotelling’s T 2

N(0,1) t Recall, using obvious notation, 2 ν. Used for one and √χν /ν ∼ two-sample t tests for univariate outcomes. We’ll now generalize this distribution.

def’n: Let d Np(0, Ip) indep. M Wp(Ip, m). Then ∼ ∼ md′M−1d T 2(p, m). ∼ thm: Let x Np(µ, Σ) indep. M Wp(Σ, m). Then ∼ ∼ m(x µ)′M−1(x µ) T 2(p, m). − − ∼ Proof : Take d∗ = Σ−1/2(x µ) and M∗ = Σ−1/2MΣ−1/2 and − use def’n of T 2. 2 Corollary: ¯x and S are sample mean and covariance from iid x ,..., xn Np(µ, Σ) 1 ∼ ⇒ (n 1)(¯x µ)′S−1(¯x µ) T 2(p, n 1). − − − ∼ − Proof : Substitute M = nS, m = n 1, and x µ for √n(¯x µ) − − − in the theorem above. 2 22/36 Hotelling’s T 2 is a scaled F

2 mp thm: T (p, m)= m−p+1 Fp,m−p+1.

To prove this we need some ingredients...

M11 M12 Let M Wp(Σ, m) and take M = where ∼ M21 M22   M Ra×a and M Rb×b and a + b = p. Further, let 11 ∈ 22 ∈ M = M M M−1M . 22.1 22 − 21 11 12

23/36 Proof, Hotelling’s T 2 is a scaled F

thm: Let M Wp(Σ, m) where m > p. Then ∼ M Wb(Σ , m a). 22.1 ∼ 22.1 − Proof : Let X =[X1X2], so

M M X′ X X′ X M = 11 12 = X′X = 1 1 1 2 . M M X′ X X′ X  21 22   2 1 2 2  Then

M = X′ X X′ X (X′ X )−1X X = X′ PX = X′ PX , 22.1 2 2 − 2 1 1 1 1 2 2 2 2.1 2.1 ′ −1 ⊥ where P = In X (X X ) X is o.p. matrix onto (X ) and − 1 1 1 1 C 1 X X = X X Σ−1Σ . Theorem on slide 14 tells us X is 2.1| 1 2 − 1 11 12 2.1 d.m. from Nb(0, Σ22.1) (not dim. p as in the book). So Cochran’s theorem tells us M X Wb(Σ , m a). This doesn’t 22.1| 1 ∼ 22.1 − depend on X1 so it’s the marginal dist’n as well. 2

24/36 Proof, Hotelling’s T 2 is a scaled F

1 1 2 lemma: If M Wp(Σ, m), m > p then −1 −1 χm−p−1. ∼ [M ]pp ∼ [Σ ]pp Proof : In general, for partitioned matrices,

−1 − −1 −1 − −1 − −1 −1 M11 M12 (M11 M12M22 M21) M11 M12(M22 M21M11 M12) = −1 −1 − −1 − . M21 M22 −M M (M − M M M ) 1 (M − M M M ) 1   " 22 21 11 12 22 21 22 21 11 12 # Now let M be upper left (p 1) (p 1) submatrix of M and 11 − × − m22 the lower right 1 1 “ matrix.” Then, where 1 × σ22.1 = −1 , [Σ ]pp 1 1 2 2 −1 = = m22.1 W1(σ22.1, m (p 1)) = σ22.1χm p . [M ]pp 1/m22.1 ∼ − − − −1 ′ − a Σ 1a 2 thm: If M Wp(Σ, m), m > p then ′ − χ . ∼ a M 1a ∼ m−p+1

Proof : Let A =[a(1) a(p−1)a]. Then −1 −1 ′ ··· −1 −1 ′ N = A M(A ) Wp(A Σ(A ) , m). So ∼ 1 1 1 1 2 −1 = −1 ′ = ′ −1 ′ −1 χm p . [N ]pp [AM A ]pp a M a ∼ a Σ a − +1 −1 −1 ′ −1 1 2 Noting that the ppth element of [A Σ(A ) ] is a′Σa . 25/36 Proof, Hotelling’s T 2 is a scaled F

′ −1 2 Recall md M d T (p, m) where d Np(0, Ip) indep. of ′ ∼ d d ∼ 2 M Wp(Ip, m). Given d, β = ′ − χ (last slide). ∼ d M 1d ∼ m−p+1 Since this is independent of d, β indep. d and this is the marginal dist’n as well.

′ 2 ′ −1 md d χp mp 2 md M d = ′ ′ −1 = m 2 = m p Fp,m−p+1. d d/d M d χm−p+1 − +1

Corollary: ¯x and S are sample mean and covariance from Np(µ, Σ) n−p ′ −1 then (¯x µ) S (¯x µ) Fp n p. p − − ∼ , −

26/36 Two more distributional results

Corollary: M / M + dd′ B( 1 (m p + 1), p ). | | | |∼ 2 − 2

Proof : For B(p n) and C(n p), Ip + BC = In + CB . × × | | | | Since AB = A B , we can write this as 1| | | || |1 1 1 I −1 ′ = I ′ −1 = ′ −1 = p . Recall | p+M dd | | 1+d M d| 1+d M d F − 1+ m−p+1 p,m p+1 ν1x/ν2 ν1 ν2 1 ν2 ν1 if x Fν1,ν2 then 1+ν x/ν B( 2 , 2 ) and 1+ν x/ν B( 2 , 2 ). 2 ∼ 1 2 ∼ 1 2 ∼

Corollary: d Np(0, Ip) indep. M Wp(Ip, m) then ∼ ∼ d′d(1+1/ d′M−1d ) χ2 indep. of d′M−1d. { } ∼ m+1 Proof: β indep. d′d (last slide); both χ2 so their sum is indep. of their ratio. Sum of two indep. χ2 is also χ2; the d.f. add. 2

27/36 Two-sample Hotelling T 2

2 ′ −1 Let D =(¯x1 ¯x2) Su (¯x1 ¯x2) estimate 2 − ′ −1 − 1 ∆ =(µ µ ) Σ (µ µ ) where Su = [n S + n S ]. 1 − 2 1 − 2 n−2 1 1 2 2 Then

thm: Let X1 d.m. from Np(µ1, Σ1) indep. X2 d.m. from Np(µ2, Σ2). If µ1 = µ2 and Σ1 = Σ2 then n1n2 D2 T 2(p, n 2). n1+n2 ∼ − 1 1 Proof : d = ¯x ¯x Np(µ µ , Σ + Σ ). When 1 2 1 2 n1 1 n2 2 − ∼ − n1+n2 µ = µ and Σ1 = Σ2, d Np(0, cΣ) where c = . Also 1 2 ∼ n1n2 M = n S + n S Wp(Σ, n + n 2) as independent Wisharts 1 1 2 2 ∼ 1 2 − w/ same scale add; cM Wp(cΣ, n + n 2). M indep. d as ∼ 1 2 − ¯x1, ¯x2, S1, S2 mutually indep. Stirring all ingredients together gives 2 D =(n 2)d′(cM)−1d T 2(p, n 2). 2. c − ∼ −

28/36 Generalization of F statistic

We’ve already generalized the t for multivariate data; now it’s time for the F .

−1 Let A Wp(Σ, m) indep. of B Wp(Σ, n) where m p. A ∼ ∼ ≥ exists a.s. and we will examine aspects of A−1B. Note that this reduces to the ratio of indep, χ2 in the univariate p = 1 case.

29/36 Generalization of F statistic

p−1 2 lemma: M Wp(Σ, m), m p, M = Σ χ . ∼ ≥ ⇒| | | | i=0 m−i Proof : By induction. For p = 1 M = m σQ2χ2 . For p > 1 let | | ∼ m M be upper left (p 1) (p 1) submatrix of M and m the 11 − × − 22 lower right 1 1 “scalar matrix” (slide 24). The induction × p−2 hypothesis says M = Σ χ2 . Slide 24 implies that | 11| | 11| i=0 m−i m indep. M and m σ χ2 . The result follows by 22.1 11 22.1 ∼Q22.1 m−p+1 noting that M = M m and Σ = Σ σ (p. 457 or | | | 11| 22.1 | | | 11| 22.1 expansion of using cofactors). 2

thm: Let A Wp(Σ, m) indep. of B Wp(Σ, n) where m p ∼ p ∼ ≥ and n p. Then φ = B / A Fn i m i . ≥ | | | |∝ i=1 − +1, − +1 Proof : Using the lemma Q 2 p χn−i p n−i 2 φ = i=0 2 = i=0 m i Fn−i+1,m−i+1. χm−i − Q Q

30/36 Wilk’s lambda

Wilk’s lambda, a generalization of the beta , appears later on when performing LRT:

def’n: A Wp(Ip, m) indep. B Wp(Ip, n) and m p ∼ ∼ ≥ Λ= A / A + B Λ(p, m, n), | | | |∼ has a Wilk’s lambda distribution with (p, m, n) n thm: Λ i=1 ui where u1,..., un are mutually independent and ∼1 p ui B( (m + i p), ). ∼ 2 Q − 2

31/36 Proof Wilk’s lambda in product of betas

′ Let X(n p) be d.m. from Np(0, Ip), B = X X and Xi be first i × ′ rows of X. Let Mi = A + Xi Xi . Then M0 = A, Mn = A + B, and ′ Mi = Mi−1 + xi xi . Then

n n A Mi−1 Λ= | | = | | = ui . A + B Mi i i | | Y=1 | | Y=1 1 p Corollary on slide 27 implies ui B( (m + i p), ). ∼ 2 − 2 The independence part takes some work...

32/36 Characterization of independence of matrix & vector

lemma: Let W Rp×p and x Rp. If x is indep. of ′ ′∈ ∈ ′ (g Wg ,..., g Wgp) for all orthogonal G =[g gp] then x 1 1 p 1 ··· indep. W. I {i

33/36 Showing independence of u1,..., un, continued...

thm: d Np(0, Ip) indep. M Wp(Ip, m) then ′ −1 ∼ p ∼ ′ d M d Fp m p indep. M + dd Wp(Ip, m + 1). ∼ m−p+1 , − +1 ∼ ′ Proof : Let G =[g gp] and take 1 ··· X1 ′ X((m + 1) p)= . Here M = X X and d = xm . Let × x′ 1 1 +1  m+1  ′ Y = XG =[Xg Xgp]=[Y ... Y ]. 1 ··· (1) (p) ′ ′ ′ ′ 2 Then qj = gj [M + dd ]gj = gj X Xgj = Y(j) . Since v v || || Y Nnp(0, Inp), Y is spherically symmetric. Define ∼ ′ ′ −1 ′ −1 h(Y)= ym+1(Y Y) ym+1 = d M d and note that h(Y)= h(YD) for all diagonal D. Theorem on p. 48 implies qj indep. h(Y) for j = 1,..., p. Now use lemma on previous slide. 2

34/36 Showing independence of u1,..., un, continued...

1 ′ −1 Theorem on last slide implies = Mi / Mi−1 =1+ x M xi ui | | | | i i−1 indep. Mi . Finally,

j ′ Mi+j = Mi + xi+k xi+k , k=1 X ui indep. of | {z } so for any i, ui indep. of ui+1,..., un. 2

35/36 Two last results...

When m is large, can also use Bartlett’s approximation:

m 1 (p n + 1) log Λ(p, m, n) • χ2 . −{ − 2 − } ∼ np

def’n: A Wp(Ip, m) indep. B Wp(Ip, n) and m p. ∼ ∼ ≥ θ(p, m, n), the largest eigenvalue of (A + B)−1B is called the greatest root statistic with parameters (p, m, n). MKB (p. 84) gives relationships between Λ(p, m, n) and θ(p, m, n)

36/36