Least-Squares Estimation: Recall That the Projection of Y Onto C(X), the Set of All Vectors of the Form Xb for B ∈ Rk+1, Yields the Closest Point in C(X) to Y

Least-Squares Estimation: Recall that the projection of y onto C(X), the set of all vectors of the form Xb for b 2 Rk+1, yields the closest point in C(X) to y. That is, p(yjC(X)) yields the minimizer of Q(¯) = ky ¡ X¯k2 (the least squares criterion) This leads to the estimator ¯^ given by the solution of XT X¯ = XT y (the normal equations) or ¯^ = (XT X)¡1XT y: All of this has already been established back when we studied projections (see pp. 30{31). Alternatively, we could use calculus: To ¯nd a stationary point (maximum, minimum, or saddle point) of Q(¯), we set the partial derivative of Q(¯) equal to zero and solve: @ @ @ Q(¯) = (y ¡ X¯)T (y ¡ X¯) = (yT y ¡ 2yT X¯ + ¯T (XT X)¯) @¯ @¯ @¯ = 0 ¡ 2XT y + 2XT X¯ @ T @ T Here we've used the vector di®erentiation formulas @z c z = c and @z z Az = 2Az (see x2.14 of our text). Setting this result equal to zero, we obtain the normal equations, which has solution ¯^ = (XT X)¡1XT y. That this is a minimum rather than a max, or saddle point can be veri¯ed by checking the second derivative matrix of Q(¯): @2Q(¯) = 2XT X @¯ which is positive de¯nite (result 7, p. 54), therefore ¯^ is a minimum. 101 Example | Simple Linear Regression Consider the case k = 1: yi = ¯0 + ¯1xi + ei; i = 1; : : : ; n 2 where e1; : : : ; en are i.i.d. each with mean 0 and variance σ . Then the model equation becomes 0 1 0 1 0 1 y 1 x e 1 1 µ ¶ 1 B y2 C B 1 x2 C ¯ B e2 C B . C = B . C 0 + B . C : @ . A @ . A ¯1 @ . A . | {z } . y 1 x e n | {z n } =¯ n =X It follows that µ P ¶ µ P ¶ T n i xi T i yi X X = P P 2 ; X y = P xi x xiyi i i i µ P i P ¶ 1 x2 ¡ x T ¡1 P P Pi i i i (X X) = 2 2 : n i xi ¡ ( i xi) ¡ i xi n Therefore, ¯^ = (XT X)¡1XT y yields µ ¶ µ P P P P ¶ ¯^ 1 ( x2)( y ) ¡ ( x )( x y ) ¯^ = 0 = P P i Pi i Pi i iP i i i : ^ 2 2 ¯1 n i xi ¡ ( i xi) ¡( i xi)( i yi) + n i xiyi After a bit of algebra, these estimators simplify to P (x ¡ x¹)(y ¡ y¹) S ^ i Pi i xy ¯1 = 2 = i(xi ¡ x¹) Sxx ^ ^ and ¯0 =y ¹ ¡ ¯1x¹ 102 In the case that X is of full rank, ¯^ and ¹^ are given by ^ T ¡1 T ^ T ¡1 T ¯ = (X X) X y; ¹^ = X¯ = X(X X) X y = PC(X)y: ² Notice that both ¯^ and ¹^ are linear functions of y. That is, in each case the estimator is given by some matrix times y. Note also that ¯^ = (XT X)¡1XT y = (XT X)¡1XT (X¯ + e) = ¯ + (XT X)¡1XT e: From this representation several important properties of the least squares estimator ¯^ follow easily: 1. (unbiasedness): E(¯^) = E(¯ + (XT X)¡1XT e) = ¯ + (XT X)¡1XT E(e) = ¯: |{z} =0 2. (var-cov matrix) var(¯^) = var(¯ + (XT X)¡1XT e) = (XT X)¡1XT var(e) X(XT X)¡1 | {z } =σ2I = σ2(XT X)¡1 ^ 2 T ¡1 3. (normality) ¯ » Nk(¯; σ (X X) ) (if e is assumed normal). ² These three properties require increasingly strong assumptions. Prop- erty (1) holds under assumptions A1 and A2 (additive error and linearity). ² Property (2) requires, in addition, the assumption of sphericity. ² Property (3) requires assumption A5 (normality). However, later we will present a central limit theorem-like result that establishes the asymptotic normality of ¯^ under certain conditions even when e is not normal. 103 Example | Simple Linear Regression (Continued) Result 2 on the previous page says for var(y) = σ2I, var(¯^) = σ2(XT X)¡1. Therefore, in the simple linear regression case, µ ¶ ^ ¯0 2 T ¡1 var ^ = σ (X X) ¯1 µ P P ¶ σ2 x2 ¡ x P P Pi i i i = 2 2 n x ¡ ( xi) ¡ i xi n i i µi P ¶ σ2 n¡1 x2 ¡x¹ P i i = 2 : i(xi ¡ x¹) ¡x¹ 1 Thus, P · ¸ σ2 x2=n 1 x¹2 ^ P i i 2 P var(¯0) = 2 = σ + 2 ; i(xi ¡ x¹) n i(xi ¡ x¹) σ2 ^ P var(¯1) = 2 ; i(xi ¡ x¹) ¡σ2x¹ ^ ^ P and cov(¯0; ¯1) = 2 i(xi ¡ x¹) ^ ^ ² Note that ifx ¹ > 0, then cov(¯0; ¯1) is negative, meaning that the slope and intercept are inversely related. That is, over repeated samples from the same model, the intercept will tend to decrease when the slope increases. 104 Gauss-Markov Theorem: We have seen that in the spherical errors, full-rank linear model, the least- squares estimator ¯^ = (XT X)¡1XT y is unbiased and it is a linear estimator. The following theorem states that in the class of linear and unbiased estimators, the least-squares estimator is optimal (or best) in the sense that it has minimum variance among all estimators in this class. Gauss-Markov Theorem: Consider the linear model y = X¯ +e where X is n £ (k + 1) of rank k + 1, where n > k + 1, E(e) = 0, and var(e) = 2 ^ σ I. The least-squares estimators ¯j, j = 0; 1; : : : ; k (the elements of ¯^ = (XT X)¡1XT y have minimum variance among all linear unbiased estimators. ^ ^ T ^ Proof: Write ¯j as ¯j = c ¯ where c is the indicator vector containing a 1 ^ T T ¡1 T in the (j + 1)st position and 0's elsewhere. Then ¯j = c (X X) X y = T T ¡1 T a y where a = X(X X) c. The quantity being estimated is ¯j = c ¯ = cT (XT X)¡1XT ¹ = aT ¹ where ¹ = X¯. ~ T Consider an arbitrary linear estimator ¯j = d y of ¯j. For such an esti- ~ T T T mator to be unbiased, it must satisfy E(¯j) = E(d y) = d ¹ = a ¹ for any ¹ 2 C(X). I.e., dT ¹ ¡ aT ¹ = 0 ) (d ¡ a)T ¹ = 0 for all ¹ 2 C(X); or (d ¡ a) ? C(X). Then ~ T T T ^ T ¯j = d y = a y + (d ¡ a) y = ¯j + (d ¡ a) y: ^ T The random variables on the right-hand side, ¯j and (d ¡ a) y, have covariance cov(aT y; (d ¡ a)T y) = aT var(y)(d ¡ a) = σ2aT (d ¡ a) = σ2(dT a ¡ aT a): Since dT ¹ = aT ¹ for any ¹ 2 C(X) and a = X(XT X)¡1c 2 C(X), it follows that dT a = aT a so that cov(aT y; (d ¡ a)T y) = σ2(dT a ¡ aT a) = σ2(aT a ¡ aT a) = 0: 105 It follows that ~ ^ T ^ 2 2 var(¯j) = var(¯j) + var((d ¡ a) y) = var(¯j) + σ jjd ¡ ajj : ~ ^ Therefore, var(¯j) ¸ var(¯j) with equality if and only if d = a, or equiva- ~ ^ lently, if and only if ¯j = ¯j. Comments: 1. Notice that nowhere in this proof did we make use of the speci¯c form of c as an indicator for one of the elements of ¯. That is, we have proved a slightly more general result than that given in the statement of the theorem. We have proved that cT ¯^ is the minimum variance estimator in the class of linear unbiased estimators of cT ¯ for any vector of constant c. 2. The least-squares estimator cT ¯^ where ¯^ = (XT X)¡1XT y is often called the B.L.U.E. (best linear unbiased estimator) of cT ¯. Some- times, it is called the Gauss-Markov estimator. 3. The variance of the BLUE is £ ¤T £ ¤ var(cT ¯^) = σ2kak2 = σ2 X(XT X)¡1c X(XT X)¡1c = σ2 cT (XT X)¡1c : Note that this variance formula depends upon X through (XT X)¡1. Two implications of this observation are: { If the columns of the X matrix are mutually orthogonal, then (XT X)¡1 will be diagonal, so that the elements of ¯^ are un- correlated. { Even for a given set of explanatory variables, the values at which the explanatory variable are observed will a®ect the variance (precision) of the resulting parameter estimators. 4. What is remarkable about the Gauss-Markov Theorem is its distri- butional generality. It does not require normality! It says that ¯^ is BLUE regardless of the distribution of e (or y) as long as we have mean zero, spherical errors. 106 An additional property of least-squares estimation is that the estimated mean ¹^ = X(XT X)¡1XT y is invariant to (doesn't change as a result of) linear changes of scale in the explanatory variables. That is, consider the linear models 0 1 1 x11 x12 ¢ ¢ ¢ x1k B 1 x21 x22 ¢ ¢ ¢ x2k C y = B . C ¯ + e @ . .. A 1 x x ¢ ¢ ¢ x | n1 n{z2 nk } =X and 0 1 1 c1x11 c2x12 ¢ ¢ ¢ ckx1k B 1 c x c x ¢ ¢ ¢ c x C 1 21 2 22 k 2k ¤ y = B . C ¯ + e @ . .. A 1 c x c x ¢ ¢ ¢ c x | 1 n1 2 {zn2 k nk } =Z Then, ¹^, the least squares estimator of E(y), is the same in both of these two models. This follows from a more general theorem: Theorem: In the linear model y = X¯ + e where E(e) = 0 and X is of full rank, ¹^, the least-squares estimator of E(y) is invariant to a full rank linear transformation of X. Proof: A full rank linear transformation of X is given by Z = XH where H is square and of full rank.

Least-Squares Estimation: Recall That the Projection of Y Onto C(X), the Set of All Vectors of the Form Xb for B ∈ Rk+1, Yields the Closest Point in C(X) to Y

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support