22S:152 Applied Linear Regression Generalized Least Squares

22s:152 Applied Linear Regression In matrix notation, we can write this model: • Generalized Least Squares Y = Xβ + " with " Nn(0, Σ) ———————————————————— ∼ ↑ ↑ mean error Ordinary Least Squares Estimation structure The classical models we have fit so far with • σ2 0 0 0 0 0 a continuous response Y have been fit using 0 σ2 0 · · · 0 0 0 ordinary least squares. ·.· · and Σ = . 0 0 0 0 σ2 0 · · · The model: 0 0 0 0 0 σ2 • · · · n n × Y = β + β x + + β x + " 2 i 0 1 1i · · · k ki i or Σ = σ In n × i.i.d with " N(0, σ2) i ∼ The variance of the vector Y is Σ, or V (Y ) = V (") = Σ. is fit by minimizing the RSS, This shows the independence of the observa- n ˆ 2 i=1(Yi Yi) = tions (off-diagonals are 0) and the constant −n ˆ ˆ ˆ 2 2 i=1(Yi (β0 + β1x1i + + βkxki)) variance (σ down the diagonal). ! − · · · ! 1 2 In matrix notation, we can show the ordinary Non-constant variance, but indepen- • Least Squares estimates for the regression co- dence holds efficients β as: In this situation we have the same model ˆ • β = (X%X)−X%Y Y = Xβ + " with " Nn(0, Σ) ∼ where the (X%X)− represents the generalized inverse 2 of X%X, useful when X is not of full rank due to the σ1 0 0 0 0 0 2 · · · chosen parameterization. 0 σ2 0 0 0 0 ·.· · And the estimate for σ2 is σˆ2 = RSS except Σ = n k 1 . − − 2 0 0 0 0 σn 1 0 · · · − 0 0 0 0 0 σ2 · · · n n n But what if the observations are not inde- × • pendent, or there is not constant variance? or " N(0, σ2) [different observations (assumptions for OLS) i ∼ i 2 have different variances] i.e. what if V (Y ) = σ In n & × Suppose we can write the variance of Yi as a Then, the appropriate estimation method for multiplier of a common variance σ2... • the regression coefficients may be through V (Y ) = V (" ) = σ2 = 1 σ2 Generalized Least Squares Estima- i i i wi tion ( ) and we say observation i has weight wi. 3 4 The weights are inversely proportional to the Weighted Least Squares • 1 2 variance of errors (wi = 2 σ ). A special case of Generalized Least Squares σi · An observation with a smaller variance has • − Useful when errors have different variance a larger weight. • but are all uncorrelated (independent) Assumes that we have some way of knowing Then, V (Y ) = Σ and • • the relative variances associated with each observation 1 0 0 0 0 0 w1 · · · Associates a weight w with each observation 0 1 0 0 0 0 • i w2 · · · ˆ ˆ ˆ ˆ . Chooses β = (β0, β1, , βk) to minimize Σ = σ2 • · · · . n ˆ ˆ ˆ 2 0 0 0 0 1 0 i=1 wi[Yi (β0 + β1x1i + + βkxki)] wn 1 − · · · · · · − 1 0 0 0 0 0 w ! · · · n n n In matrix form the Generalized Least Squares × • = σ2W 1 estimates are: − ˆ β = (X%W X)− X%W Y and W is a n n diagonal matrix of weights. n ˆ 2 × ˆ2 i=1 wi(Yi Yi) σ = n k 1− This special case where Σ is diagonal is very ! − − useful, and is known as weighted least squares. ∗ Notice the similarity to the OLS form, but now with the W . 5 6 Example situations: Example: Apple Shoots data • • 1. If data has been summarized and ith re- Using trees planted in 1933 and 1934, Bland took samples of shoots from apple tress every sponse is the average of ni observations each with constant variance σ2, then few days throughout the 1971 growing season (about 106 days). σ2 V ar(Yi) = n and wi = ni. i He counted the number of stem units per shoot. This measurement was thought to help understand the growth of the tress (fruit- 2. If variance is proportional to some ing and branching). predictor xi, then We do not know the number of stem units for V ar(Y ) = x σ2 and w = 1 . i i i xi every shoot, but we know the average number of stem units per shoot for all samples on a given day. We are interested in modeling the relationship between day of collection (observed) and number of stem units on a sample (not directly observed). 7 8 VARIABLES 14 55 11 22.64 1.76 1 15 58 9 22.78 0.84 1 16 61 14 23.93 1.16 1 day days from dormancy 17 69 10 25.50 0.98 1 18 73 12 25.08 1.94 1 (day of collection) 19 76 9 26.67 1.23 1 n number of shoots collected 20 88 7 28.00 1.01 1 y number of stem units per shoot 21 100 10 31.67 1.42 1 ybar average number of stem units per shoot 22 106 7 32.14 2.28 1 for shoots collected on that day Notice we do not have y, and there are a va- (i.e. y/n) riety of number of samples taken on each day. ! > apple.long Plot the relationship between ybar and day. day n ybar sd long 1 0 5 10.20 0.83 1 ! ! 2 3 5 10.40 0.54 1 30 3 7 5 10.60 0.54 1 ! ! 4 13 6 12.50 0.83 1 ! ! 25 5 18 5 12.00 1.41 1 ! ! ! 6 24 4 15.00 0.82 1 ! ! ! ybar 20 ! 7 25 6 15.17 0.76 1 ! 8 32 5 17.00 0.72 1 ! !! 9 38 7 18.71 0.74 1 15 ! 10 42 9 19.22 0.84 1 ! ! ! ! 11 44 10 20.00 1.26 1 10 0 20 40 60 80 100 12 49 19 20.32 1.00 1 day 13 52 14 22.07 1.20 1 9 10 If these were individual y observations, we We’ll use our usual lm() function, but in- could fit our usual linear model. clude the weights option. > lm.out=lm(ybar ~ day,weights=n) But, some of the observations provide more > summary(lm.out) information on the conditional mean (n larger), i Coefficients: and others provide less information on the Estimate Std. Error t value Pr(>|t|) conditional mean (ni smaller). (Intercept) 9.973754 0.314272 31.74 <2e-16 *** day 0.217330 0.005339 40.71 <2e-16 *** --- If we assume a constant variance σ2 for the Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 simple linear regression model of y regressed Residual standard error: 1.929 on 20 degrees of freedom on day, then these ybar observations have a Multiple R-Squared: 0.9881,Adjusted R-squared: 0.9875 non-constant variance related to ni, with F-statistic: 1657 on 1 and 20 DF, p-value: < 2.2e-16 2 Var(ybar ) = σ . i ni These estimates βˆ0 and βˆ1 coincide with the We can fit this model using Weighted Least simple linear regression model, but we’ve ac- Squares estimation with wi = ni. counted for the non-constant variance in our observations. And σˆ = 1.929 11 12 We can see the non-constant variance in the residual plot: ! ! ! ! ! ! 0.5 ! ! ! ! ! ! 0.0 ! ! ! ! ! 0.5 ! ! ! ! ! lm.out$residuals 1.5 ! ! 10 15 20 25 30 lm.out$fitted.values And the observations with higher ni tend to have lower variability: ! 1.5 ! 1.0 ! ! ! ! ! ! abs(residual) ! ! ! 0.5 ! ! ! ! ! ! ! ! ! ! 0.0 5 10 15 n 13.

Load more