Eigenvalues and Eigenvectors

◮ Suppose A is an n n symmetric with real entries. × n ◮ The function from R to R defined by

x xt Ax 7→ is called a quadratic form. T T 2 ◮ We can maximize x Ax subject to x x = x =1 by || || Lagrange multipliers:

xT Ax λ(xT x 1) − −

◮ Take and get

xT x = 1

and 2Ax 2λx = 0 − Richard Lockhart STAT 350: General Theory ◮ We say that v is an eigenvector of A with eigenvalue λ if v = 0 and 6 Av = λv T ◮ For such a v and λ with v v =1 we find

v T Av = λv T v = λ.

◮ So the quadratic form is maximized over vectors of length one by the eigenvector with the largest eigenvalue.

◮ Call that eigenvector v1, eigenvalue λ1. T T T ◮ Maximize x Ax subject to x x = 1 and v1 x = 0. ◮ Get new eigenvector and eigenvalue.

Richard Lockhart STAT 350: General Theory Summary of Results

Theorem Suppose A is a real symmetric n n matrix. × 1. There are n orthonormal eigenvectors v1,..., vn with corresponding eigenvalues λ λ . 1 ≥···≥ n 2. If P is the n n matrix whose columns are v ,..., v and Λ is × 1 n the with λ1,...,λn on the diagonal then

AP = PΛ or PT ΛP = A and PT AP =Λ and PT P = I and

3. If A is non-negative definite (that is, A is a covariance matrix) then each λ 0. i ≥ 4. A is singular if and only if at least one eigenvalue is 0.

5. The of A is λi . Q

Richard Lockhart STAT 350: General Theory The trace of a matrix Definition: If A is square then the trace of A is the sum of its diagonal elements: tr(A)= Aii Xi Theorem If A and B are any two matrices such that AB is square then

tr(AB)= tr(BA)

r If A1,..., Ar are matrices such that j=1 Aj is square then Q tr(A1 Ar )= tr(A2 Ar A1)= = tr(As Ar A1 As 1) · · · · · · · · · · · · · · · − If A is symmetric then

tr(A)= λi Xi

Richard Lockhart STAT 350: General Theory Idempotent Matrices

Definition: A A is idempotent if A2 = AA = A.

Theorem A matrix A is idempotent if and only if all its eigenvalues are either 0 or 1. The number of eigenvalues equal to 1 is then tr(A).

Proof: If A is idempotent, λ is an eigenvalue and v a corresponding eigenvector then

λv = Av = AAv = λAv = λ2v

Since v =0 we find λ λ2 = λ(1 λ) = 0 so either λ = 0 or 6 − − λ = 1.

Richard Lockhart STAT 350: General Theory Conversely

◮ Write A = PΛPT so A2 = PΛPT PΛPT = PΛ2PT

◮ Have used the fact that P is orthogonal.

◮ Each entry in the diagonal of Λ is either 0 or 1 2 ◮ So Λ =Λ

◮ So A2 = A.

Richard Lockhart STAT 350: General Theory Finally

tr(A)= tr(PΛPT ) = tr(ΛPT P) = tr(Λ)

Since all the diagonal entries in Λ are 0 or 1 we are done the proof.

Richard Lockhart STAT 350: General Theory Independence

Definition: If U1, U2,... Uk are random variables then we call U1,..., Uk independent if

P(U A ,..., U A )= P(U A ) P(U A ) 1 ∈ 1 k ∈ k 1 ∈ 1 ×···× k ∈ k

for any sets A1,..., Ak . We usually either:

◮ Assume independence because there is no physical way for the value of any of the random variables to influence any of the others. OR

◮ We prove independence.

Richard Lockhart STAT 350: General Theory Joint Densities

◮ How do we prove independence?

◮ We use the notion of a joint density.

◮ U1,..., Uk have joint density function f = f (u1,..., uk ) if

P((U ,..., U ) A)= f (u ,..., u )du du 1 k ∈ · · · 1 k 1 · · · k Z A Z

◮ Independence of U1,..., Uk is equivalent to

f (u ,..., u )= f (u ) f (u ) 1 k 1 1 ×···× k k

for some densities f1,..., fk .

◮ In this case fi is the density of Ui .

◮ ASIDE: notice that for an independent sample the joint density is the likelihood function!

Richard Lockhart STAT 350: General Theory Application to Normals: Standard Case

If Z1 . Z = . MVN(0, In n)   ∼ × Z  n    then the joint density of Z, denoted fZ (z1,..., zn) is

f (z ,..., z )= φ(z ) φ(z ) Z 1 n 1 ×···× n where 1 z2/2 φ(zi )= e− i √2π

Richard Lockhart STAT 350: General Theory So

n n/2 1 2 fZ = (2π)− exp zi (−2 ) Xi=1 n/2 1 T = (2π)− exp z z −2   where z1 . z =  .  z  n   

Richard Lockhart STAT 350: General Theory Application to Normals: General Case

If X = AZ + µ and A is invertible then for any set B Rn we have ∈ P(X B)= P(AZ + µ B) ∈ ∈ 1 = P(Z A− (B µ)) ∈ − n/2 1 T = (2π)− exp z z dz dz · · · −2 1 · · · n AZ 1(B Zµ)   − − Make the change of variables x = Az + µ in this integral to get

n/2 P(X B)= (2π)− ∈ · · · Z B Z

1 1 T 1 exp A− (x µ) A− (x µ) J(x)dx dx × −2 − − 1 · · · n    

Richard Lockhart STAT 350: General Theory Here J(x) denotes the Jacobian of the transformation

∂zi 1 J(x)= J(x ,..., x )= det = det A− 1 n ∂x  j   Algebraic manipulation of the integral then gives

n/2 P(X B)= (2π)− ∈ · · · Z B Z

1 T 1 1 exp (x µ) Σ− (x µ) detA− dx dx × −2 − − | | 1 · · · n   where Σ= AAT 1 1 T 1 Σ− = A− A− 1 1 2 detΣ− = detA−  1 =  detΣ

Richard Lockhart STAT 350: General Theory Multivariate Normal Density

◮ Conclusion: the MVN(µ, Σ) density is

n/2 1 T 1 1/2 (2π)− exp (x µ) Σ− (x µ) (detΣ)− −2 − −  

◮ What if A is not invertible? Ans: there is no density.

◮ How do we apply this density?

◮ Suppose X X = 1 X  2  and Σ Σ Σ= 11 12 Σ Σ  21 22  ◮ Now suppose Σ12 = 0

Richard Lockhart STAT 350: General Theory Assuming Σ12 =0

1. Σ21 = 0 2. In homework you checked that

Σ 1 0 Σ 1 = 11− − 0 Σ 1  22−  3. Writing x x = 1 x  2  and µ µ = 1 µ  2  we find

T 1 T 1 (x µ) Σ− (x µ) = (x µ ) Σ− (x µ ) − − 1 − 1 11 1 − 1 T 1 + (x µ ) Σ− (x µ ) 2 − 2 22 2 − 2

Richard Lockhart STAT 350: General Theory 4. So, if n1 = dim(X1) and n2 = dim(X2) we see that

n /2 1 T 1 f (x , x ) = (2π)− 1 exp (x µ ) Σ− (x µ ) X 1 2 −2 1 − 1 11 1 − 1   n /2 1 T 1 (2π)− 2 exp (x µ ) Σ− (x µ ) × −2 2 − 2 22 2 − 2  

5. So X1 and X2 are independent.

Richard Lockhart STAT 350: General Theory Summary

T ◮ If Cov(X , X )= E[(X µ )(X µ ) ] = 0 then X is 1 2 1 − 1 2 − 2 1 independent of X2.

◮ Warning: This only works provided

X X = 1 MVN(µ, Σ) X ∼  2 

◮ Fact: However, it works even if Σ is singular, but you can’t prove it as easily using densities.

Richard Lockhart STAT 350: General Theory Application: independence in linear models

T 1 T µˆ = X βˆ = X (X X )− X Y = X β + Hǫ ǫˆ = Y X βˆ − = ǫ Hǫ − = (I H)ǫ − So µˆ H ǫ µ = σ + ǫˆ I H σ 0    −    A b Hence | {z } | {z } µˆ µ MVN ; AAT ǫˆ ∼ 0     

Richard Lockhart STAT 350: General Theory Now H A = σ I H  −  so H AAT = σ2 HT (I H)T I H −  −  h i HH H(I H) = σ2 − (I H)H (I H)(I H)  − − −  H H H = σ2 − H H I H H + HH  − − −  H 0 = σ2 0 I H  −  The 0s prove thatǫ ˆ andµ ˆ are independent. It follows thatµ ˆT µˆ, the regression sum of squares (not adjusted) is independent ofǫ ˆT ǫˆ, the Error sum of squares.

Richard Lockhart STAT 350: General Theory Joint Densities: some manipulations

◮ Suppose Z1 and Z2 are independent standard normals.

◮ Their joint density is 1 f (z , z )= exp( (z2 + z2)/2) . 1 2 2π − 1 2

2 ◮ Show meaning of joint density by computing density of a χ2 random variable. 2 2 ◮ Let U = Z1 + Z2 . 2 ◮ By definition U has a χ distribution with 2 degrees of freedom.

Richard Lockhart STAT 350: General Theory 2 Computing χ2 density

◮ Cumulative distribution function of U is

F (u)= P(U u). ≤

◮ For u 0 this is 0 so take u 0. ≤ ≥ ◮ Event U u is same as event that point (Z , Z ) is in the ≤ 1 2 circle centered at the origin and having radius u1/2.

◮ That is, if A is the circle of this radius then

F (u)= P((Z , Z ) A) . 1 2 ∈

◮ By definition of density this is a double integral

f (z1, z2) dz1 dz2 . ZZA ◮ You do this integral in polar co-ordinates.

Richard Lockhart STAT 350: General Theory Integral in Polar co-ordinates

◮ Let z1 = r cos θ and z2 = r sin θ. ◮ we see that 1 f (r cos θ, r sin θ)= exp( r 2/2) . 2π −

◮ The Jacobian of the transformation is r so that dz1 dz2 becomes rdrdθ. ◮ Finally the region of integration is simply 0 θ 2π and ≤ ≤ 0 r u1/2 so that ≤ ≤ 1/2 u 2π 1 P(U u) = exp( r 2/2)rdrdθ ≤ 2π − Z0 Z0 u1/2 = r exp( r 2/2)dr − Z0 u1/2 = exp( r 2/2) − − 0 = 1 exp( u/2) . − − Richard Lockhart STAT 350: General Theory ◮ Density of U found by differentiating to get 1 f (u)= exp( u/2) 2 − which is the exponential density with mean 2. 2 ◮ This means that the χ2 density is really an exponential density.

Richard Lockhart STAT 350: General Theory t tests

◮ We have shown thatµ ˆ andǫ ˆ are independent. T ◮ So the Regression Sum of Squares (unadjusted) (=ˆµ µˆ) and the Error Sum of Squares (=ˆǫT ǫˆ) are independent.

◮ Similarly

βˆ β (X T X ) 1 0 MVN ; σ2 − ǫˆ ∼ 0 0 I H      −  so that βˆ andǫ ˆ are independent.

Richard Lockhart STAT 350: General Theory Conclusions

◮ We see T T 2 t T 1 a βˆ a β N 0,σ a (X X )− a − ∼ is independent of   ǫˆT ǫˆ σˆ2 = n p − ◮ If we know that T ǫˆ ǫˆ 2 χn p σ2 ∼ − then it would follow that

aT βˆ aT β − t T 1 T ˆ σ√a (X X )− a a (β β) = − tn p ǫˆT ǫ/ˆ (n p)σ2 MSEat (X T X ) 1a ∼ − { − } − ◮ This leavesp only the question:p how do I know that

T 2 2 ǫˆ ǫ/ˆ σ χn p { }∼ −

Richard Lockhart STAT 350: General Theory Distribution of the Error Sum of Squares

◮ Recall: if Z1,..., Zn are iid N(0, 1) then

U = Z 2 + + Z 2 χ2 1 · · · n ∼ n T 2 2 2 ◮ So we rewrite ˆǫ ˆǫ/ σ as Z1 + + Zn p for some · · · − Z1,..., Zn p which are iid N(0, 1). −  Put ◮ ǫ Z ∗ = MVNn(0, In n) σ ∼ × ◮ Then

T ǫˆ ǫˆ T T = Z ∗ (I H)(I H)Z ∗ = Z ∗ (I H)Z ∗. σ2 − − −

◮ Now define new vector Z from Z ∗ so that 1. Z MVN(0, I ) ∼T n p 2 2. Z ∗ (I H)Z ∗ = − Z − i=1 i P Richard Lockhart STAT 350: General Theory Distribution of Quadratic Forms

Theorem If Z has a standard n dimensional multivariate normal distribution and A is a symmetric n n matrix then the distribution of Z T AZ × is the same as that of 2 λi Zi

where the λi are the n eigenvaluesX of Q. Theorem 2 The distribution in the last theorem is χν if and only if all the λi are 0 or 1 and ν of them are 1. Theorem The distribution is chi-squared if and only if A is idempotent. In this case tr(A)= ν.

Richard Lockhart STAT 350: General Theory Rewriting a Quadratic Form as a Sum of Squares

T ◮ Consider (Z ∗) AZ ∗ where A is symmetric matrix and Z ∗ is standard multivariate normal.

◮ In earlier application A = I H. T − ◮ Replace A by PΛP in this formula

◮ Get

T T T (Z ∗) QZ ∗ = (Z ∗) PΛP Z ∗ T T T = (P Z ∗) Λ(P Z ∗) = Z T ΛZ

T where Z = P Z ∗.

Richard Lockhart STAT 350: General Theory ◮ Notice that Z has a multivariate normal distribution

◮ mean is 0 and variance is

T Var(Z)= P P = In n ×

◮ So Z is also standard multivariate normal!

◮ Now look at what happens when you multiply out

Z T ΛZ

◮ Multiplying a diagonal matrix by Z simply multiplies the ith entry in Z by the ith diagonal element

◮ So λ1Z1 . ΛZ =  .  λ Z  n n   

Richard Lockhart STAT 350: General Theory ◮ Take of this with Z:

T 2 Z ΛZ = λi Zi . X ◮ Have rewritten our original quadratic form as a linear combination of squared independent standard normals, 2 ◮ That is, as a linear combination of independent χ1 variables.

Richard Lockhart STAT 350: General Theory Application to Error Sum of Squares

◮ Recall that ESS T = (Z ∗) (I H)Z ∗ σ2 −

where Z ∗ = ǫ/σ is multivariate standard normal.

◮ The matrix I H is idempotent − 2 2 ◮ So ESS/σ has a χ distribution with degrees of freedom ν equal to trace(I H): − ν = trace(I H) − = trace(I ) trace(H) − T 1 T = n trace(X (X X )− X ) − T 1 T = n trace((X X )− X X ) − = n trace(Ip p) − × = n p −

Richard Lockhart STAT 350: General Theory Summary of Distribution theory conclusions

T 2 2 1. ǫ Aǫ/σ has the same distribution as λ1Zi where the Zi are iid N(0, 1) random variables (so the Z 2 are iid χ2) and the P i 1 λi are the eigenvalues of A. 2. A2 = A (A is idempotent) implies that all the eigenvalues of A are either 0 or 1. 3. Points 1 and 2 prove that A2 = A implies that ǫT Aǫ/σ2 χ2 . ∼ trace(A) 4. A special case is T ǫˆ ǫˆ 2 2 χn p σ ∼ − 5. t statistics have t distributions.

6. If Ho : β = 0 is true then

(ˆµT µˆ)/p F = Fp,n p ǫˆT ǫ/ˆ (n p) ∼ − −

Richard Lockhart STAT 350: General Theory Many Extensions are Possible The most important of these are: 1. If a “reduced” model is obtained from a “full” model by imposing k linearly independent linear restrictions on β (like β1 = β2, β1 + β2 = 2β3) then ESS ESS Extra SS = R − F χ2 σ2 ∼ k assuming that the null hypothesis (the restricted model) is true. 2. So the Extra Sum of Squares F test has an F -distribution. 3. In ANOVA tables which add up the various rows (not including the total) are independent.

4. When null Ho is not true distribution of Regression SS is Non-central χ2. 5. Used in power and sample size calculations.

Richard Lockhart STAT 350: General Theory