(OLS) Consider the Following Simple Linear Regression Model Yi =

Lecture 14 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model Yi = α + βXi + "i where, for each unit i, • Yi is the dependent variable (response). • Xi is the independent variable (predictor). • "i is the error between the observed Yi and what the model predicts. This model describes the relation between Xi and Yi using an intercept and a slope parameter. The error "i = Yi − α − βXi is also known as the residual because it measures the amount of variation in Yi not explained by the model. ^ 2 We saw last class that there existsα ^ and β that minimize the sum of "i . Specifically, we wish to findα ^ and β^ such that n X 2 Yi − (^α + βX^ i) i=1 is the minimum. Using calculus, it turns out that P(X − X¯)(Y − Y¯ ) α^ = Y¯ − β^X¯ β^ = i i : P 2 (Xi − X¯) OLS and Transformation If we center the predictor, X~i = Xi − X¯, then X~i has mean zero. Therefore, P X~ (Y − Y¯ ) α^∗ = Y¯ β^∗ = i i : P ~ 2 Xi ∗ By horizontally shifting the value of Xi, note that β = β, but the intercept changed to the overall average of Yi Consider the linear transformation Zi = a + bXi with Z¯ = a + bX¯. Consider the linear model ∗ ∗ Yi = α + β Zi + "i 1 Letα ^ and β^ be the estimates using X as the predictor. Then the intercept and slope using Z as the predictor are P(Z − Z¯)(Y − Y¯ ) β^∗ = i i P 2 (Zi − Z¯) P(a + bX − a − bX¯)(Y − Y¯ ) = i i P(a + bX¯ − a − bX¯)2 P(bX − bX¯)(Y − Y¯ ) = i i P(bX¯ − bX¯)2 b P(X − X¯)(Y − Y¯ ) 1 = i i = β^ b2 P(X¯ − X¯)2 b and α^∗ = Y¯ − β^∗Z¯ β^ aβ^ a = Y¯ − (a + bX¯) = Y¯ − β^X¯ − =α ^ − β^ b b b Zi−a We can get the same results by substituting Xi = b into the linear model. Yi = α + βXi + "i Z − a Y = α + β i + " i b i aβ β Y = α − + Z + " i b b i i ∗ ∗ Yi = α + β Zi + "i Properties of OLS Given the estimatesα ^ and β^, we can define (1) the estimated predicted value Yî and (2) the estimated residual" î. Yî =α ^ + βX^ i "î = Yi − Yî = Yi − α^ − βX^ i The least squared estimates have the following properties. P 1. i "î = 0 n n n n X X X X "î = (Yi − α^ − βX^ i) = Yi − nα^ − β^ Xi i=1 i=1 i=1 i=1 = nY¯ − nα^ − nβ^X¯ = n(Y¯ − α^ − β^X¯) = n(Y¯ − (Y¯ − β^X¯) − β^X¯) = 0 2 2. X¯ and Y¯ is always on the fitted line. α^ + β^X¯ = (Y¯ − β^X¯) + β^X¯ = Y¯ ^ sY 3. β = rXY × , where sY and sX are the sample standard deviation of X and Y , and rXY is sX the correlation between X and Y . Note that the sample correlation is given by: P (Xi − X¯)(Yi − Y¯ ) rXY = : (n − 1)sX sY Therefore, P(X − X¯)(Y − Y¯ ) n − 1 s s β^ = i i × × Y × X P 2 (Xi − X¯) n − 1 sY sX Cov(X; Y ) sY sX = 2 × × sX sY sX Cov(X; Y ) s 1 = × Y × sX sY 1 sX sY = rXY × sX 4. β^ is a weighted sum of Yi. P(X − X¯)(Y − Y¯ ) β^ = i i P 2 (Xi − X¯) P(X − X¯)Y P(X − X¯)Y¯ = i i − i P 2 P 2 (Xi − X¯) (Xi − X¯) P(X − X¯)Y Y¯ P(X − X¯) = i i − i P 2 P 2 (Xi − X¯) (Xi − X¯) P(X − X¯)Y = i i − 0 P 2 (Xi − X¯) ¯ X Xi − X = Y P 2 i (Xi − X¯) ¯ X Xi − X = w Y ; w = i i i P 2 (Xi − X¯) 3 5.^α is a weighted sum of Yi. α^ = Y¯ − β^X¯ X 1 X = Y − X¯ w Y n i i i ¯ ¯ X 1 X(Xi − X) = v Y ; v = − i i i P 2 n (Xi − X¯) P 6. Xi"î = 0. X X X Xi"î = (Yi − Yî)Xi = (Yi − α^ − βX^ i)Xi X = (Yi − (Y¯ − β^X¯) − βX^ i)Xi X = ((Yi − Y¯ ) − β^(Xi − X¯))Xi X X = Xi(Yi − Y¯ ) − β^ Xi(Xi − X¯) X 2 X = β^ (Xi − X¯) − β^ Xi(Xi − X¯) X 2 = β^ (Xi − X¯) − Xi(Xi − X¯) ^ X 2 ¯ ¯ 2 2 ¯ = β Xi − 2XiX + X − Xi + XiX) X 2 = β^ X¯ − XiX¯ X = β^X¯ X¯ − Xi = 0 P(X − X¯)(Y − Y¯ ) P X (Y − Y¯ ) Note that β^ = i i = i i P 2 P 2 (Xi − X¯) (Xi − X¯) 7. Partition of total variation. (Yi − Y¯ ) = Yi − Y¯ + Yî − Yî = (Yî − Y¯ ) + (Yi − Yî) • (Yi − Y¯ ): total deviation of the observed Yi from a null model (β = 0). • (Y^ − Y¯ ): deviation of the modeled value Yî (β 6= 0) from a null model (β = 0). • (Y^ − Yi): deviation of the modeled value from the observed value. Total Variations (Sum of Squares) P 2 • (Yi − Y¯ ) : total variation: (n − 1) × V (Y ). • P(Y^ − Y¯ )2: variation explained by the model 4 P ^ 2 P 2 • (Y − Yi) : variation not explained by the model; residual sum of squares (SSE), "î Partition of Total Variation X 2 X 2 X 2 (Yi − Y¯ ) = (Yî − Y¯ ) + (Yi − Yî) Proof: 2 X 2 X h i (Yi − Y¯ ) = (Yî − Y¯ ) + (Yi − Yî) X 2 X 2 X = (Yî − Y¯ ) + (Yi − Yî) + 2 (Yî − Y¯ )(Yi − Yî) X 2 X 2 = (Yî − Y¯ ) + (Yi − Yî) where, X X h i (Yî − Y¯ )(Yi − Yî) = (^α + βX^ i) − (^α + β^X¯) "î X h i = β^(Xi − X¯) "î X X = β^ Xi"î − β^X¯ "î = 0 + 0 P 2 8. Alternative formula for "î X 2 X 2 X 2 (Yi − Yî) = (Yi − Y¯ ) − (Yî − Y¯ ) X 2 X 2 = (Yi − Y¯ ) − (^α + βX^ i − Y¯ ) X 2 X 2 = (Yi − Y¯ ) − (Y¯ + βXX¯ + βX^ i − Y¯ ) 2 X 2 X h i = (Yi − Y¯ ) − β^(Xi − X¯) X 2 2 X 2 = (Yi − Y¯ ) − β^ (Xi − X¯) or P ¯ ¯ 2 X X (Xi − X)(Yi − Y ) (Y − Y^ )2 = (Y − Y¯ )2 − i i i P 2 (Xi − X¯) 9. Variation Explained by the Model: 5 P 2 An easy way to calculate (Yî − Y¯ ) is X 2 X 2 X 2 (Yî − Y¯ ) = (Yi − Y¯ ) − (Yi − Yî) P ¯ ¯ 2 X (Xi − X)(Yi − Y ) = (Y − Y¯ )2 − i P 2 (Xi − X¯) P 2 (Xi − X¯)(Yi − Y¯ ) = P 2 (Xi − X¯) 10. Coefficient of Determination r2: variation explained by the model P(Y^ − Y¯ )2 r2 = = i P 2 total variation (Yi − Y¯ ) (a) r2 is related the residual variance. X 2 X 2 X 2 (Yi − Yî) = (Yi − Y¯ ) − (Yî − Y¯ ) P ¯ 2 hX X i (Yi − Y ) = (Y − Y¯ )2 − (Y^ − Y¯ )2 × i i P 2 (Yi − Y¯ ) 2 X 2 = (1 − r ) × (Yi − Y¯ ) Diving the right-hand side by n − 2 and the left-hand side by n − 1, for large n s2 ≈ (1 − r2)Var(Y ) (b) r2 is the square of the sample correlation between X and Y . P(Y^ − Y¯ )2 r2 = i P 2 (Yi − Y¯ ) P 2 (Xi − X¯)(Yi − Y¯ ) = P 2 P 2 (Yi − Y¯ ) (Xi − X¯) " #2 P(X − X¯)(Y − Y¯ ) = i i = r2 pP 2pP 2 XY (Yi − Y¯ ) (Xi − X¯) Statistical Inference for OLS Estimates Parametersα ^ and β^ can be estimated for any given sample of data. Therefore, we also need to consider their sampling distributions because each sample of (Xi;Yi) pairs will result in different estimates of α and β. 6 Consider the following distribution assumption on the error, iid 2 Yi = α + βXi + "i "i ∼ N (0; σ ) : The above is now a statistical model that describes the distribution of Yi given Xi. Specifically, we assume the observed Yi is error-prone but centered around the linear model for each value of Xi. iid 2 Yi ∼ N(α + βXi; σ ) The error distribution results in the following assumptions. 1. Homoscedasticity: variance of Yi is the same for any Xi. 2 V (Yi) = V (α + βXi + "i) = V ("i) = σ 2. Linearity: the mean of Yi is linear with respect to Xi E(Yi) = E(α + βXi + "i) = α + βXi + E("i) = α + βXi 3. Independence: Yi are independent of each other. Cov(Yi;Yj) = Cov(α + βXi + "i; α + βXj + "j) = Cov("i;"j) = 0 We can now investigate the bias and variance of OLS estimatorsα ^ and β^. We assume that Xi are known constants. • β^ is unbiased. ¯ hX i Xi − X E[β^] = E w Y ; w = i i i P 2 (Xi − X¯) X = wiE[Yi] X = wi(α + βXi) X X = α wi + β wiXi = β 7 Because X 1 X w = (X − X¯) = 0 i P 2 i (Xi − X¯) X 1 X w X = X (X − X¯) i i P 2 i i (Xi − X¯) 1 X ¯ = P P Xi(Xi − X) Xi(Xi − X¯) − X¯(Xi − X¯) 1 X ¯ = P P Xi(Xi − X) Xi(Xi − X¯) − X¯ (Xi − X¯) 1 X ¯ = P Xi(Xi − X) Xi(Xi − X¯) − 0 = 1 • α^ is unbiased. E[^α] = E[Y¯ − β^X¯] = E[Y¯ ] − XE¯ [β^] 1 X 1 X = E Y − βX¯ = E [α + βX ] − βX¯ n i n i 1 = (nα + nβX ) − βX¯ n i = α + βX¯ − βX¯ = α σ2 σ2 • β^ has variance = , where σ2 is the residual variance and s2 is the Pn ¯ 2 2 x i=1(Xi − X1) (n − 1)sx sample variance of x1; : : : ; xn.

(OLS) Consider the Following Simple Linear Regression Model Yi =

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support