<<

Simple

Example: Body density Aim: Measure body density (weight per unit volume of the body) (Body density indicates the fat content of the human body.) Problem: ◦ Body density is difficult to measure directly. ◦ Research suggests that skinfold thickness can accurately predict body density. ◦ Skinfold thickness is measures by pinching a fold of skin between calipers.

2.0 ) 3 1.8 m kg 3

10 1.6 (

1.4 Body Density 1.2

1.0 1.03 1.04 1.05 1.06 1.07 1.08 1.09 Skinfold Thickness (mm)

Questions: ◦ Are body density and skinfold thickness related? ◦ How accurately can we predict body density from skinfold thickness?

Regression: predict response variable for fixed value of explanatory variable ◦ describe linear relationship in by regression line ◦ fitted regression line is affected by chance variation in observed data

Statistical inference: accounts for chance variation in data

Simple Linear Regression, Feb 27, 2004 - 1 - Population Regression Line

Simple linear regression studies the relationship between ◦ a response variable Y and ◦ a single explanatory variable X. We expect that different values of X will produce different responses of Y . For given X = x, we consider the subpopulation with X = x: ◦ this subpopulation has mean

µY |X=x = E(Y |X = x) (cond. mean of Y given X = x)

◦ and

2 σY |X=x = var(Y |X = x) (cond. variance of Y given X = x) Linear regression model with constant variance:

E(Y |X = x) = µY |X=x = a + b x (population regression line) 2 2 var(Y |X = x) = σY |X=x = σ ◦ The population regression line connects the conditional of the response variable for fixed values of the explanatory variable. ◦ This population regression line tells how the mean response of Y varies with X. ◦ The variance (and ) does not depend on x.

Simple Linear Regression, Feb 27, 2004 - 2 - Conditional Mean

Sample (x1, y1),..., (xn, yn)

6 1 5 2 3 4 4 5 3 6 7 2 8 9 1 10 11 0 12

Sampling probability f(x, y)

6 0 1 5 2 3 4 4 5 3 6 7 2 8 9 1 10 11 0 12 y fix x = x0

6 0 f(x0, y) 1 5 2 3 4 4 5 3 6 7 2 8 y 9 1 10 rescale by fX (x0) 11 0 12 Conditional probability 6 0 1 5 2 fXY (x0, y) 3 4 4 f(y|x0) = 5 3 f (x ) 6 X 0 7 2 8 9 1 10 11 0 12

Z E(Y |X = x0) = y fY |X(y|x0) dy conditional mean

Simple Linear Regression, Feb 27, 2004 - 3 - The Linear Regression Model

Simple linear regression

Yi = a + b xi + εi, i = 1, . . . , n

where

Yi response (also dependent variable)

xi predictor (also independent variable)

εi error

Assumptions:

◦ Predictor xi is deterministic (fixed values, not random).

◦ Errors have zero mean, E(εi) = 0. 2 ◦ Variation about mean does not depend on xi, i.e. var(εi) = σ .

◦ Errors εi are independent. Often we additionally assume: ◦ The errors are normally distributed,

iid 2 εi ∼ N (0, σ ).

For fixed x the response Y is normally distributed with

Y ∼ N (a + b x, σ2).

Simple Linear Regression, Feb 27, 2004 - 4 - Estimation

Data: (Y1, x1),..., (Yn, xn)

Aim: Find straight line which fits data best:

Yˆi = a + b xi fitted values for coefficients a and b

a - intercept b - Least Squares Approach:

Minimize squared distance between observed Yi and fitted Yˆi: n n P 2 P 2 L(a, b) = (Yi − Yˆi) = (Yi − a − b xi) i=1 i=1

Set partial derivatives to zero (normal equations):

n ∂L P = 0 ⇔ (Yi − a − b xi) = 0 ∂a i=1 n ∂L P = 0 ⇔ (Yi − a − b xi) · xi = 0 ∂b i=1

Solution: Least squares estimators S aˆ = Y¯ − XY · X¯ SXX S ˆb = XY SXX where

n P SXY = (Yi − Y¯ )(xi − x¯) (sum of squares) i=1 n P 2 SXX = (xi − x¯) i=1

Simple Linear Regression, Feb 27, 2004 - 5 - Least Squares Estimation

Least squares predictor Yˆ ˆ Yˆi =a ˆ + b xi

Residuals εˆi:

εˆi = Yi − Yˆi ˆ = Yi − aˆ − b xi

Residual sum of squares (SS Residual)

n n P 2 P ˆ 2 SS Residual = εˆi = (Yi − Yi) i=1 i=1 Estimation of σ2 n 2 1 P 2 1 σˆ = (Yi − Yˆi) = SS Residual n − 2 i=1 n − 2 Regression p se =σ ˆ = SS Residual/(n − 2)

Variation accounting:

n P 2 SS Total = (Yi − Y¯ ) total variation i=1 n P 2 SS Model = (Yˆi − Y¯ ) variation explained by i=1 n P 2 SS Residual = (Yi − Yˆi) remaining variation i=1

Simple Linear Regression, Feb 27, 2004 - 6 - Least Squares Estimation

Example: Body density Scatter with least squares regression line:

2.0 ) 3 1.8 m kg 3

10 1.6 (

1.4 Body Density 1.2

1.0 1.03 1.04 1.05 1.06 1.07 1.08 1.09 Skinfold Thickness (mm)

Calculation of least squares estimates:

x¯ y¯ SXX SXY SYY SS Residual 1.064 1.568 0.0235 -0.2679 4.244 1.187

S −0.267 ˆb = XY = = −11.40 SXX 0.023 aˆ =y ¯ − ˆbx¯ = 1.568 + 11.40 · 1.064 = 13.70 RSS 1.187 σˆ2 = = = 0.0132 n − 2 90 √ √ 2 se = σˆ = 0.0132 = 0.1149

Simple Linear Regression, Feb 27, 2004 - 7 - Least Squares Estimation

Example: Returns on Treasury bills and inflation Using STATA:

. infile ID BODYD SKINT using bodydens.txt, clear (92 observations read) . regress BODYD SKINT Source | SS df MS Number of obs = 92 ------+------F( 1, 90) = 231.89 Model | 3.05747739 1 3.05747739 Prob > F = 0.0000 Residual | 1.18663025 90 .013184781 R-squared = 0.7204 ------+------Adj R-squared = 0.7173 Total | 4.24410764 91 .046638546 Root MSE = .11482 ------BODYD | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------SKINT | -11.41345 .7494999 -15.23 0.000 -12.90246 -9.924433 _cons | 13.71221 .7975822 17.19 0.000 12.12768 15.29675 ------. twoway (lfitci BODYD SKINT, (1 1.1)) (scatter BODYD SKINT), xtitle(Skin thickn > ess) ytitle(Body density) scheme(s1color) legend(off) 2.5 2 Body density 1.5 1 1 1.02 1.04 1.06 1.08 1.1 Skin thickness

Simple Linear Regression, Feb 27, 2004 - 8 -