
Simple Linear Regression Example: Body density Aim: Measure body density (weight per unit volume of the body) (Body density indicates the fat content of the human body.) Problem: ◦ Body density is difficult to measure directly. ◦ Research suggests that skinfold thickness can accurately predict body density. ◦ Skinfold thickness is measures by pinching a fold of skin between calipers. 2.0 ) 3 1.8 m kg 3 10 1.6 ( 1.4 Body Density 1.2 1.0 1.03 1.04 1.05 1.06 1.07 1.08 1.09 Skinfold Thickness (mm) Questions: ◦ Are body density and skinfold thickness related? ◦ How accurately can we predict body density from skinfold thickness? Regression: predict response variable for fixed value of explanatory variable ◦ describe linear relationship in data by regression line ◦ fitted regression line is affected by chance variation in observed data Statistical inference: accounts for chance variation in data Simple Linear Regression, Feb 27, 2004 - 1 - Population Regression Line Simple linear regression studies the relationship between ◦ a response variable Y and ◦ a single explanatory variable X. We expect that different values of X will produce different mean responses of Y . For given X = x, we consider the subpopulation with X = x: ◦ this subpopulation has mean µY |X=x = E(Y |X = x) (cond. mean of Y given X = x) ◦ and variance 2 σY |X=x = var(Y |X = x) (cond. variance of Y given X = x) Linear regression model with constant variance: E(Y |X = x) = µY |X=x = a + b x (population regression line) 2 2 var(Y |X = x) = σY |X=x = σ ◦ The population regression line connects the conditional means of the response variable for fixed values of the explanatory variable. ◦ This population regression line tells how the mean response of Y varies with X. ◦ The variance (and standard deviation) does not depend on x. Simple Linear Regression, Feb 27, 2004 - 2 - Conditional Mean Sample (x1, y1),..., (xn, yn) 6 1 5 2 3 4 4 5 3 6 7 2 8 9 1 10 11 0 12 Sampling probability f(x, y) 6 0 1 5 2 3 4 4 5 3 6 7 2 8 9 1 10 11 0 12 y fix x = x0 6 0 f(x0, y) 1 5 2 3 4 4 5 3 6 7 2 8 y 9 1 10 rescale by fX (x0) 11 0 12 Conditional probability 6 0 1 5 2 fXY (x0, y) 3 4 4 f(y|x0) = 5 3 f (x ) 6 X 0 7 2 8 9 1 10 11 0 12 Z E(Y |X = x0) = y fY |X(y|x0) dy conditional mean Simple Linear Regression, Feb 27, 2004 - 3 - The Linear Regression Model Simple linear regression Yi = a + b xi + εi, i = 1, . , n where Yi response (also dependent variable) xi predictor (also independent variable) εi error Assumptions: ◦ Predictor xi is deterministic (fixed values, not random). ◦ Errors have zero mean, E(εi) = 0. 2 ◦ Variation about mean does not depend on xi, i.e. var(εi) = σ . ◦ Errors εi are independent. Often we additionally assume: ◦ The errors are normally distributed, iid 2 εi ∼ N (0, σ ). For fixed x the response Y is normally distributed with Y ∼ N (a + b x, σ2). Simple Linear Regression, Feb 27, 2004 - 4 - Least Squares Estimation Data: (Y1, x1),..., (Yn, xn) Aim: Find straight line which fits data best: Yˆi = a + b xi fitted values for coefficients a and b a - intercept b - slope Least Squares Approach: Minimize squared distance between observed Yi and fitted Yˆi: n n P 2 P 2 L(a, b) = (Yi − Yˆi) = (Yi − a − b xi) i=1 i=1 Set partial derivatives to zero (normal equations): n ∂L P = 0 ⇔ (Yi − a − b xi) = 0 ∂a i=1 n ∂L P = 0 ⇔ (Yi − a − b xi) · xi = 0 ∂b i=1 Solution: Least squares estimators S aˆ = Y¯ − XY · X¯ SXX S ˆb = XY SXX where n P SXY = (Yi − Y¯ )(xi − x¯) (sum of squares) i=1 n P 2 SXX = (xi − x¯) i=1 Simple Linear Regression, Feb 27, 2004 - 5 - Least Squares Estimation Least squares predictor Yˆ ˆ Yˆi =a ˆ + b xi Residuals εˆi: εˆi = Yi − Yˆi ˆ = Yi − aˆ − b xi Residual sum of squares (SS Residual) n n P 2 P ˆ 2 SS Residual = εˆi = (Yi − Yi) i=1 i=1 Estimation of σ2 n 2 1 P 2 1 σˆ = (Yi − Yˆi) = SS Residual n − 2 i=1 n − 2 Regression standard error p se =σ ˆ = SS Residual/(n − 2) Variation accounting: n P 2 SS Total = (Yi − Y¯ ) total variation i=1 n P 2 SS Model = (Yˆi − Y¯ ) variation explained by linear model i=1 n P 2 SS Residual = (Yi − Yˆi) remaining variation i=1 Simple Linear Regression, Feb 27, 2004 - 6 - Least Squares Estimation Example: Body density Scatter plot with least squares regression line: 2.0 ) 3 1.8 m kg 3 10 1.6 ( 1.4 Body Density 1.2 1.0 1.03 1.04 1.05 1.06 1.07 1.08 1.09 Skinfold Thickness (mm) Calculation of least squares estimates: x¯ y¯ SXX SXY SYY SS Residual 1.064 1.568 0.0235 -0.2679 4.244 1.187 S −0.267 ˆb = XY = = −11.40 SXX 0.023 aˆ =y ¯ − ˆbx¯ = 1.568 + 11.40 · 1.064 = 13.70 RSS 1.187 σˆ2 = = = 0.0132 n − 2 90 √ √ 2 se = σˆ = 0.0132 = 0.1149 Simple Linear Regression, Feb 27, 2004 - 7 - Least Squares Estimation Example: Returns on Treasury bills and inflation Using STATA: . infile ID BODYD SKINT using bodydens.txt, clear (92 observations read) . regress BODYD SKINT Source | SS df MS Number of obs = 92 -------------+------------------------------ F( 1, 90) = 231.89 Model | 3.05747739 1 3.05747739 Prob > F = 0.0000 Residual | 1.18663025 90 .013184781 R-squared = 0.7204 -------------+------------------------------ Adj R-squared = 0.7173 Total | 4.24410764 91 .046638546 Root MSE = .11482 ------------------------------------------------------------------------------ BODYD | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- SKINT | -11.41345 .7494999 -15.23 0.000 -12.90246 -9.924433 _cons | 13.71221 .7975822 17.19 0.000 12.12768 15.29675 ------------------------------------------------------------------------------ . twoway (lfitci BODYD SKINT, range(1 1.1)) (scatter BODYD SKINT), xtitle(Skin thickn > ess) ytitle(Body density) scheme(s1color) legend(off) 2.5 2 Body density 1.5 1 1 1.02 1.04 1.06 1.08 1.1 Skin thickness Simple Linear Regression, Feb 27, 2004 - 8 -.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-