Eigenvalues and Eigenvectors
◮ Suppose A is an n n symmetric matrix with real entries. × n ◮ The function from R to R defined by
x xt Ax 7→ is called a quadratic form. T T 2 ◮ We can maximize x Ax subject to x x = x =1 by || || Lagrange multipliers:
xT Ax λ(xT x 1) − −
◮ Take derivatives and get
xT x = 1
and 2Ax 2λx = 0 − Richard Lockhart STAT 350: General Theory ◮ We say that v is an eigenvector of A with eigenvalue λ if v = 0 and 6 Av = λv T ◮ For such a v and λ with v v =1 we find
v T Av = λv T v = λ.
◮ So the quadratic form is maximized over vectors of length one by the eigenvector with the largest eigenvalue.
◮ Call that eigenvector v1, eigenvalue λ1. T T T ◮ Maximize x Ax subject to x x = 1 and v1 x = 0. ◮ Get new eigenvector and eigenvalue.
Richard Lockhart STAT 350: General Theory Summary of Linear Algebra Results
Theorem Suppose A is a real symmetric n n matrix. × 1. There are n orthonormal eigenvectors v1,..., vn with corresponding eigenvalues λ λ . 1 ≥···≥ n 2. If P is the n n matrix whose columns are v ,..., v and Λ is × 1 n the diagonal matrix with λ1,...,λn on the diagonal then
AP = PΛ or PT ΛP = A and PT AP =Λ and PT P = I and
3. If A is non-negative definite (that is, A is a variance covariance matrix) then each λ 0. i ≥ 4. A is singular if and only if at least one eigenvalue is 0.
5. The determinant of A is λi . Q
Richard Lockhart STAT 350: General Theory The trace of a matrix Definition: If A is square then the trace of A is the sum of its diagonal elements: tr(A)= Aii Xi Theorem If A and B are any two matrices such that AB is square then
tr(AB)= tr(BA)
r If A1,..., Ar are matrices such that j=1 Aj is square then Q tr(A1 Ar )= tr(A2 Ar A1)= = tr(As Ar A1 As 1) · · · · · · · · · · · · · · · − If A is symmetric then
tr(A)= λi Xi
Richard Lockhart STAT 350: General Theory Idempotent Matrices
Definition: A symmetric matrix A is idempotent if A2 = AA = A.
Theorem A matrix A is idempotent if and only if all its eigenvalues are either 0 or 1. The number of eigenvalues equal to 1 is then tr(A).
Proof: If A is idempotent, λ is an eigenvalue and v a corresponding eigenvector then
λv = Av = AAv = λAv = λ2v
Since v =0 we find λ λ2 = λ(1 λ) = 0 so either λ = 0 or 6 − − λ = 1.
Richard Lockhart STAT 350: General Theory Conversely
◮ Write A = PΛPT so A2 = PΛPT PΛPT = PΛ2PT
◮ Have used the fact that P is orthogonal.
◮ Each entry in the diagonal of Λ is either 0 or 1 2 ◮ So Λ =Λ
◮ So A2 = A.
Richard Lockhart STAT 350: General Theory Finally
tr(A)= tr(PΛPT ) = tr(ΛPT P) = tr(Λ)
Since all the diagonal entries in Λ are 0 or 1 we are done the proof.
Richard Lockhart STAT 350: General Theory Independence
Definition: If U1, U2,... Uk are random variables then we call U1,..., Uk independent if
P(U A ,..., U A )= P(U A ) P(U A ) 1 ∈ 1 k ∈ k 1 ∈ 1 ×···× k ∈ k
for any sets A1,..., Ak . We usually either:
◮ Assume independence because there is no physical way for the value of any of the random variables to influence any of the others. OR
◮ We prove independence.
Richard Lockhart STAT 350: General Theory Joint Densities
◮ How do we prove independence?
◮ We use the notion of a joint density.
◮ U1,..., Uk have joint density function f = f (u1,..., uk ) if
P((U ,..., U ) A)= f (u ,..., u )du du 1 k ∈ · · · 1 k 1 · · · k Z A Z
◮ Independence of U1,..., Uk is equivalent to
f (u ,..., u )= f (u ) f (u ) 1 k 1 1 ×···× k k
for some densities f1,..., fk .
◮ In this case fi is the density of Ui .
◮ ASIDE: notice that for an independent sample the joint density is the likelihood function!
Richard Lockhart STAT 350: General Theory Application to Normals: Standard Case
If Z1 . Z = . MVN(0, In n) ∼ × Z n then the joint density of Z, denoted fZ (z1,..., zn) is
f (z ,..., z )= φ(z ) φ(z ) Z 1 n 1 ×···× n where 1 z2/2 φ(zi )= e− i √2π
Richard Lockhart STAT 350: General Theory So
n n/2 1 2 fZ = (2π)− exp zi (−2 ) Xi=1 n/2 1 T = (2π)− exp z z −2 where z1 . z = . z n
Richard Lockhart STAT 350: General Theory Application to Normals: General Case
If X = AZ + µ and A is invertible then for any set B Rn we have ∈ P(X B)= P(AZ + µ B) ∈ ∈ 1 = P(Z A− (B µ)) ∈ − n/2 1 T = (2π)− exp z z dz dz · · · −2 1 · · · n AZ 1(B Zµ) − − Make the change of variables x = Az + µ in this integral to get
n/2 P(X B)= (2π)− ∈ · · · Z B Z
1 1 T 1 exp A− (x µ) A− (x µ) J(x)dx dx × −2 − − 1 · · · n
Richard Lockhart STAT 350: General Theory Here J(x) denotes the Jacobian of the transformation
∂zi 1 J(x)= J(x ,..., x )= det = det A− 1 n ∂x j Algebraic manipulation of the integral then gives
n/2 P(X B)= (2π)− ∈ · · · Z B Z
1 T 1 1 exp (x µ) Σ− (x µ) detA− dx dx × −2 − − | | 1 · · · n where Σ= AAT 1 1 T 1 Σ− = A− A− 1 1 2 detΣ− = detA− 1 = detΣ
Richard Lockhart STAT 350: General Theory Multivariate Normal Density
◮ Conclusion: the MVN(µ, Σ) density is
n/2 1 T 1 1/2 (2π)− exp (x µ) Σ− (x µ) (detΣ)− −2 − −
◮ What if A is not invertible? Ans: there is no density.
◮ How do we apply this density?
◮ Suppose X X = 1 X 2 and Σ Σ Σ= 11 12 Σ Σ 21 22 ◮ Now suppose Σ12 = 0
Richard Lockhart STAT 350: General Theory Assuming Σ12 =0
1. Σ21 = 0 2. In homework you checked that
Σ 1 0 Σ 1 = 11− − 0 Σ 1 22− 3. Writing x x = 1 x 2 and µ µ = 1 µ 2 we find
T 1 T 1 (x µ) Σ− (x µ) = (x µ ) Σ− (x µ ) − − 1 − 1 11 1 − 1 T 1 + (x µ ) Σ− (x µ ) 2 − 2 22 2 − 2
Richard Lockhart STAT 350: General Theory 4. So, if n1 = dim(X1) and n2 = dim(X2) we see that
n /2 1 T 1 f (x , x ) = (2π)− 1 exp (x µ ) Σ− (x µ ) X 1 2 −2 1 − 1 11 1 − 1 n /2 1 T 1 (2π)− 2 exp (x µ ) Σ− (x µ ) × −2 2 − 2 22 2 − 2
5. So X1 and X2 are independent.
Richard Lockhart STAT 350: General Theory Summary
T ◮ If Cov(X , X )= E[(X µ )(X µ ) ] = 0 then X is 1 2 1 − 1 2 − 2 1 independent of X2.
◮ Warning: This only works provided
X X = 1 MVN(µ, Σ) X ∼ 2
◮ Fact: However, it works even if Σ is singular, but you can’t prove it as easily using densities.
Richard Lockhart STAT 350: General Theory Application: independence in linear models
T 1 T µˆ = X βˆ = X (X X )− X Y = X β + Hǫ ǫˆ = Y X βˆ − = ǫ Hǫ − = (I H)ǫ − So µˆ H ǫ µ = σ + ǫˆ I H σ 0 − A b Hence | {z } | {z } µˆ µ MVN ; AAT ǫˆ ∼ 0
Richard Lockhart STAT 350: General Theory Now H A = σ I H − so H AAT = σ2 HT (I H)T I H − − h i HH H(I H) = σ2 − (I H)H (I H)(I H) − − − H H H = σ2 − H H I H H + HH − − − H 0 = σ2 0 I H − The 0s prove thatǫ ˆ andµ ˆ are independent. It follows thatµ ˆT µˆ, the regression sum of squares (not adjusted) is independent ofǫ ˆT ǫˆ, the Error sum of squares.
Richard Lockhart STAT 350: General Theory Joint Densities: some manipulations
◮ Suppose Z1 and Z2 are independent standard normals.
◮ Their joint density is 1 f (z , z )= exp( (z2 + z2)/2) . 1 2 2π − 1 2
2 ◮ Show meaning of joint density by computing density of a χ2 random variable. 2 2 ◮ Let U = Z1 + Z2 . 2 ◮ By definition U has a χ distribution with 2 degrees of freedom.
Richard Lockhart STAT 350: General Theory 2 Computing χ2 density
◮ Cumulative distribution function of U is
F (u)= P(U u). ≤
◮ For u 0 this is 0 so take u 0. ≤ ≥ ◮ Event U u is same as event that point (Z , Z ) is in the ≤ 1 2 circle centered at the origin and having radius u1/2.
◮ That is, if A is the circle of this radius then
F (u)= P((Z , Z ) A) . 1 2 ∈
◮ By definition of density this is a double integral
f (z1, z2) dz1 dz2 . ZZA ◮ You do this integral in polar co-ordinates.
Richard Lockhart STAT 350: General Theory Integral in Polar co-ordinates
◮ Let z1 = r cos θ and z2 = r sin θ. ◮ we see that 1 f (r cos θ, r sin θ)= exp( r 2/2) . 2π −
◮ The Jacobian of the transformation is r so that dz1 dz2 becomes rdrdθ. ◮ Finally the region of integration is simply 0 θ 2π and ≤ ≤ 0 r u1/2 so that ≤ ≤ 1/2 u 2π 1 P(U u) = exp( r 2/2)rdrdθ ≤ 2π − Z0 Z0 u1/2 = r exp( r 2/2)dr − Z0 u1/2 = exp( r 2/2) − − 0 = 1 exp( u/2) . − − Richard Lockhart STAT 350: General Theory ◮ Density of U found by differentiating to get 1 f (u)= exp( u/2) 2 − which is the exponential density with mean 2. 2 ◮ This means that the χ2 density is really an exponential density.
Richard Lockhart STAT 350: General Theory t tests
◮ We have shown thatµ ˆ andǫ ˆ are independent. T ◮ So the Regression Sum of Squares (unadjusted) (=ˆµ µˆ) and the Error Sum of Squares (=ˆǫT ǫˆ) are independent.
◮ Similarly
βˆ β (X T X ) 1 0 MVN ; σ2 − ǫˆ ∼ 0 0 I H − so that βˆ andǫ ˆ are independent.
Richard Lockhart STAT 350: General Theory Conclusions
◮ We see T T 2 t T 1 a βˆ a β N 0,σ a (X X )− a − ∼ is independent of ǫˆT ǫˆ σˆ2 = n p − ◮ If we know that T ǫˆ ǫˆ 2 χn p σ2 ∼ − then it would follow that
aT βˆ aT β − t T 1 T ˆ σ√a (X X )− a a (β β) = − tn p ǫˆT ǫ/ˆ (n p)σ2 MSEat (X T X ) 1a ∼ − { − } − ◮ This leavesp only the question:p how do I know that
T 2 2 ǫˆ ǫ/ˆ σ χn p { }∼ −
Richard Lockhart STAT 350: General Theory Distribution of the Error Sum of Squares
◮ Recall: if Z1,..., Zn are iid N(0, 1) then
U = Z 2 + + Z 2 χ2 1 · · · n ∼ n T 2 2 2 ◮ So we rewrite ˆǫ ˆǫ/ σ as Z1 + + Zn p for some · · · − Z1,..., Zn p which are iid N(0, 1). − Put ◮ ǫ Z ∗ = MVNn(0, In n) σ ∼ × ◮ Then
T ǫˆ ǫˆ T T = Z ∗ (I H)(I H)Z ∗ = Z ∗ (I H)Z ∗. σ2 − − −
◮ Now define new vector Z from Z ∗ so that 1. Z MVN(0, I ) ∼T n p 2 2. Z ∗ (I H)Z ∗ = − Z − i=1 i P Richard Lockhart STAT 350: General Theory Distribution of Quadratic Forms
Theorem If Z has a standard n dimensional multivariate normal distribution and A is a symmetric n n matrix then the distribution of Z T AZ × is the same as that of 2 λi Zi
where the λi are the n eigenvaluesX of Q. Theorem 2 The distribution in the last theorem is χν if and only if all the λi are 0 or 1 and ν of them are 1. Theorem The distribution is chi-squared if and only if A is idempotent. In this case tr(A)= ν.
Richard Lockhart STAT 350: General Theory Rewriting a Quadratic Form as a Sum of Squares
T ◮ Consider (Z ∗) AZ ∗ where A is symmetric matrix and Z ∗ is standard multivariate normal.
◮ In earlier application A = I H. T − ◮ Replace A by PΛP in this formula
◮ Get
T T T (Z ∗) QZ ∗ = (Z ∗) PΛP Z ∗ T T T = (P Z ∗) Λ(P Z ∗) = Z T ΛZ
T where Z = P Z ∗.
Richard Lockhart STAT 350: General Theory ◮ Notice that Z has a multivariate normal distribution
◮ mean is 0 and variance is
T Var(Z)= P P = In n ×
◮ So Z is also standard multivariate normal!
◮ Now look at what happens when you multiply out
Z T ΛZ
◮ Multiplying a diagonal matrix by Z simply multiplies the ith entry in Z by the ith diagonal element
◮ So λ1Z1 . ΛZ = . λ Z n n
Richard Lockhart STAT 350: General Theory ◮ Take dot product of this with Z:
T 2 Z ΛZ = λi Zi . X ◮ Have rewritten our original quadratic form as a linear combination of squared independent standard normals, 2 ◮ That is, as a linear combination of independent χ1 variables.
Richard Lockhart STAT 350: General Theory Application to Error Sum of Squares
◮ Recall that ESS T = (Z ∗) (I H)Z ∗ σ2 −
where Z ∗ = ǫ/σ is multivariate standard normal.
◮ The matrix I H is idempotent − 2 2 ◮ So ESS/σ has a χ distribution with degrees of freedom ν equal to trace(I H): − ν = trace(I H) − = trace(I ) trace(H) − T 1 T = n trace(X (X X )− X ) − T 1 T = n trace((X X )− X X ) − = n trace(Ip p) − × = n p −
Richard Lockhart STAT 350: General Theory Summary of Distribution theory conclusions
T 2 2 1. ǫ Aǫ/σ has the same distribution as λ1Zi where the Zi are iid N(0, 1) random variables (so the Z 2 are iid χ2) and the P i 1 λi are the eigenvalues of A. 2. A2 = A (A is idempotent) implies that all the eigenvalues of A are either 0 or 1. 3. Points 1 and 2 prove that A2 = A implies that ǫT Aǫ/σ2 χ2 . ∼ trace(A) 4. A special case is T ǫˆ ǫˆ 2 2 χn p σ ∼ − 5. t statistics have t distributions.
6. If Ho : β = 0 is true then
(ˆµT µˆ)/p F = Fp,n p ǫˆT ǫ/ˆ (n p) ∼ − −
Richard Lockhart STAT 350: General Theory Many Extensions are Possible The most important of these are: 1. If a “reduced” model is obtained from a “full” model by imposing k linearly independent linear restrictions on β (like β1 = β2, β1 + β2 = 2β3) then ESS ESS Extra SS = R − F χ2 σ2 ∼ k assuming that the null hypothesis (the restricted model) is true. 2. So the Extra Sum of Squares F test has an F -distribution. 3. In ANOVA tables which add up the various rows (not including the total) are independent.
4. When null Ho is not true distribution of Regression SS is Non-central χ2. 5. Used in power and sample size calculations.
Richard Lockhart STAT 350: General Theory