Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors ◮ Suppose A is an n n symmetric matrix with real entries. × n ◮ The function from R to R defined by x xt Ax 7→ is called a quadratic form. T T 2 ◮ We can maximize x Ax subject to x x = x =1 by || || Lagrange multipliers: xT Ax λ(xT x 1) − − ◮ Take derivatives and get xT x = 1 and 2Ax 2λx = 0 − Richard Lockhart STAT 350: General Theory ◮ We say that v is an eigenvector of A with eigenvalue λ if v = 0 and 6 Av = λv T ◮ For such a v and λ with v v =1 we find v T Av = λv T v = λ. ◮ So the quadratic form is maximized over vectors of length one by the eigenvector with the largest eigenvalue. ◮ Call that eigenvector v1, eigenvalue λ1. T T T ◮ Maximize x Ax subject to x x = 1 and v1 x = 0. ◮ Get new eigenvector and eigenvalue. Richard Lockhart STAT 350: General Theory Summary of Linear Algebra Results Theorem Suppose A is a real symmetric n n matrix. × 1. There are n orthonormal eigenvectors v1,..., vn with corresponding eigenvalues λ λ . 1 ≥···≥ n 2. If P is the n n matrix whose columns are v ,..., v and Λ is × 1 n the diagonal matrix with λ1,...,λn on the diagonal then AP = PΛ or PT ΛP = A and PT AP =Λ and PT P = I and 3. If A is non-negative definite (that is, A is a variance covariance matrix) then each λ 0. i ≥ 4. A is singular if and only if at least one eigenvalue is 0. 5. The determinant of A is λi . Q Richard Lockhart STAT 350: General Theory The trace of a matrix Definition: If A is square then the trace of A is the sum of its diagonal elements: tr(A)= Aii Xi Theorem If A and B are any two matrices such that AB is square then tr(AB)= tr(BA) r If A1,..., Ar are matrices such that j=1 Aj is square then Q tr(A1 Ar )= tr(A2 Ar A1)= = tr(As Ar A1 As 1) · · · · · · · · · · · · · · · − If A is symmetric then tr(A)= λi Xi Richard Lockhart STAT 350: General Theory Idempotent Matrices Definition: A symmetric matrix A is idempotent if A2 = AA = A. Theorem A matrix A is idempotent if and only if all its eigenvalues are either 0 or 1. The number of eigenvalues equal to 1 is then tr(A). Proof: If A is idempotent, λ is an eigenvalue and v a corresponding eigenvector then λv = Av = AAv = λAv = λ2v Since v =0 we find λ λ2 = λ(1 λ) = 0 so either λ = 0 or 6 − − λ = 1. Richard Lockhart STAT 350: General Theory Conversely ◮ Write A = PΛPT so A2 = PΛPT PΛPT = PΛ2PT ◮ Have used the fact that P is orthogonal. ◮ Each entry in the diagonal of Λ is either 0 or 1 2 ◮ So Λ =Λ ◮ So A2 = A. Richard Lockhart STAT 350: General Theory Finally tr(A)= tr(PΛPT ) = tr(ΛPT P) = tr(Λ) Since all the diagonal entries in Λ are 0 or 1 we are done the proof. Richard Lockhart STAT 350: General Theory Independence Definition: If U1, U2,... Uk are random variables then we call U1,..., Uk independent if P(U A ,..., U A )= P(U A ) P(U A ) 1 ∈ 1 k ∈ k 1 ∈ 1 ×···× k ∈ k for any sets A1,..., Ak . We usually either: ◮ Assume independence because there is no physical way for the value of any of the random variables to influence any of the others. OR ◮ We prove independence. Richard Lockhart STAT 350: General Theory Joint Densities ◮ How do we prove independence? ◮ We use the notion of a joint density. ◮ U1,..., Uk have joint density function f = f (u1,..., uk ) if P((U ,..., U ) A)= f (u ,..., u )du du 1 k ∈ · · · 1 k 1 · · · k Z A Z ◮ Independence of U1,..., Uk is equivalent to f (u ,..., u )= f (u ) f (u ) 1 k 1 1 ×···× k k for some densities f1,..., fk . ◮ In this case fi is the density of Ui . ◮ ASIDE: notice that for an independent sample the joint density is the likelihood function! Richard Lockhart STAT 350: General Theory Application to Normals: Standard Case If Z1 . Z = . MVN(0, In n) ∼ × Z n then the joint density of Z, denoted fZ (z1,..., zn) is f (z ,..., z )= φ(z ) φ(z ) Z 1 n 1 ×···× n where 1 z2/2 φ(zi )= e− i √2π Richard Lockhart STAT 350: General Theory So n n/2 1 2 fZ = (2π)− exp zi (−2 ) Xi=1 n/2 1 T = (2π)− exp z z −2 where z1 . z = . z n Richard Lockhart STAT 350: General Theory Application to Normals: General Case If X = AZ + µ and A is invertible then for any set B Rn we have ∈ P(X B)= P(AZ + µ B) ∈ ∈ 1 = P(Z A− (B µ)) ∈ − n/2 1 T = (2π)− exp z z dz dz · · · −2 1 · · · n AZ 1(B Zµ) − − Make the change of variables x = Az + µ in this integral to get n/2 P(X B)= (2π)− ∈ · · · Z B Z 1 1 T 1 exp A− (x µ) A− (x µ) J(x)dx dx × −2 − − 1 · · · n Richard Lockhart STAT 350: General Theory Here J(x) denotes the Jacobian of the transformation ∂zi 1 J(x)= J(x ,..., x )= det = det A− 1 n ∂x j Algebraic manipulation of the integral then gives n/2 P(X B)= (2π)− ∈ · · · Z B Z 1 T 1 1 exp (x µ) Σ− (x µ) detA− dx dx × −2 − − | | 1 · · · n where Σ= AAT 1 1 T 1 Σ− = A− A− 1 1 2 detΣ− = detA− 1 = detΣ Richard Lockhart STAT 350: General Theory Multivariate Normal Density ◮ Conclusion: the MVN(µ, Σ) density is n/2 1 T 1 1/2 (2π)− exp (x µ) Σ− (x µ) (detΣ)− −2 − − ◮ What if A is not invertible? Ans: there is no density. ◮ How do we apply this density? ◮ Suppose X X = 1 X 2 and Σ Σ Σ= 11 12 Σ Σ 21 22 ◮ Now suppose Σ12 = 0 Richard Lockhart STAT 350: General Theory Assuming Σ12 =0 1. Σ21 = 0 2. In homework you checked that Σ 1 0 Σ 1 = 11− − 0 Σ 1 22− 3. Writing x x = 1 x 2 and µ µ = 1 µ 2 we find T 1 T 1 (x µ) Σ− (x µ) = (x µ ) Σ− (x µ ) − − 1 − 1 11 1 − 1 T 1 + (x µ ) Σ− (x µ ) 2 − 2 22 2 − 2 Richard Lockhart STAT 350: General Theory 4. So, if n1 = dim(X1) and n2 = dim(X2) we see that n /2 1 T 1 f (x , x ) = (2π)− 1 exp (x µ ) Σ− (x µ ) X 1 2 −2 1 − 1 11 1 − 1 n /2 1 T 1 (2π)− 2 exp (x µ ) Σ− (x µ ) × −2 2 − 2 22 2 − 2 5. So X1 and X2 are independent. Richard Lockhart STAT 350: General Theory Summary T ◮ If Cov(X , X )= E[(X µ )(X µ ) ] = 0 then X is 1 2 1 − 1 2 − 2 1 independent of X2. ◮ Warning: This only works provided X X = 1 MVN(µ, Σ) X ∼ 2 ◮ Fact: However, it works even if Σ is singular, but you can’t prove it as easily using densities. Richard Lockhart STAT 350: General Theory Application: independence in linear models T 1 T µˆ = X βˆ = X (X X )− X Y = X β + Hǫ ǫˆ = Y X βˆ − = ǫ Hǫ − = (I H)ǫ − So µˆ H ǫ µ = σ + ǫˆ I H σ 0 − A b Hence | {z } | {z } µˆ µ MVN ; AAT ǫˆ ∼ 0 Richard Lockhart STAT 350: General Theory Now H A = σ I H − so H AAT = σ2 HT (I H)T I H − − h i HH H(I H) = σ2 − (I H)H (I H)(I H) − − − H H H = σ2 − H H I H H + HH − − − H 0 = σ2 0 I H − The 0s prove thatǫ ˆ andµ ˆ are independent. It follows thatµ ˆT µˆ, the regression sum of squares (not adjusted) is independent ofǫ ˆT ǫˆ, the Error sum of squares. Richard Lockhart STAT 350: General Theory Joint Densities: some manipulations ◮ Suppose Z1 and Z2 are independent standard normals. ◮ Their joint density is 1 f (z , z )= exp( (z2 + z2)/2) . 1 2 2π − 1 2 2 ◮ Show meaning of joint density by computing density of a χ2 random variable. 2 2 ◮ Let U = Z1 + Z2 . 2 ◮ By definition U has a χ distribution with 2 degrees of freedom. Richard Lockhart STAT 350: General Theory 2 Computing χ2 density ◮ Cumulative distribution function of U is F (u)= P(U u). ≤ ◮ For u 0 this is 0 so take u 0. ≤ ≥ ◮ Event U u is same as event that point (Z , Z ) is in the ≤ 1 2 circle centered at the origin and having radius u1/2. ◮ That is, if A is the circle of this radius then F (u)= P((Z , Z ) A) . 1 2 ∈ ◮ By definition of density this is a double integral f (z1, z2) dz1 dz2 . Z ZA ◮ You do this integral in polar co-ordinates. Richard Lockhart STAT 350: General Theory Integral in Polar co-ordinates ◮ Let z1 = r cos θ and z2 = r sin θ. ◮ we see that 1 f (r cos θ, r sin θ)= exp( r 2/2) . 2π − ◮ The Jacobian of the transformation is r so that dz1 dz2 becomes rdrdθ. ◮ Finally the region of integration is simply 0 θ 2π and ≤ ≤ 0 r u1/2 so that ≤ ≤ 1/2 u 2π 1 P(U u) = exp( r 2/2)rdrdθ ≤ 2π − Z0 Z0 u1/2 = r exp( r 2/2)dr − Z0 u1/2 = exp( r 2/2) − − 0 = 1 exp( u/2) .

Load more