Chapter 02

Ch.2 The simple regression model

1. Definition of the simple regression model The Simple Regression Model 2. Deriving the OLS estimates 3. Mechanics of OLS 4. Units of measurement & functional form y = 0 + 1x + u 5. Expected values & of OLSE 6. Regression through the origin

Econometrics 1 2

2.1 Definition of the model

Equation (2.1), y = 0 + 1x + u, defines the Simple Regression model. In the model, we typically refer to

 y as the Dependent Variable

 x as the Independent Variable

 s as , and

 u as the error term.

Econometrics 3 Econometrics 4

The Concept of Error Term A Simple Assumption for u

u represents factors other than x that affect y. The average value of u, the error term, in If the other factors in u are held fixed, so that the population is 0. That is, E(u) = 0.

u = 0, then y = 1x.  This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0. Ex. 2.1: yield = 0 + 1fertilizer + u (2.3)  u includes land quality, rainfall, etc. To draw ceteris paribus conclusions about Ex. 2.2: wage =  +  educ + u (2.4) how x affects y, we have to hold all other 0 1 factors (in u) fixed.  u includes experience, ability, tenure, etc.

Econometrics 5 Econometrics 6

Simple Regression Model 1 Chapter 02

E(y|x) as a linear function of x, where for any x Zero Conditional the distribution of y is centered about E(y|x) y We need to make a crucial assumption about how u and x are related. f(y) We want it to be the case that knowing something about x does not give us any . E(y|x) =  +  x information about u, so that they are 0 1 completely unrelated. That is, that .  E(u|x) = E(u) = 0 (2.5&2.6), which implies

 E(y|x) = 0 + 1x (PRF) (2.8)

x1 x2 x Econometrics 7 Econometrics 8

Population regression line, data points 2.2 Deriving the OLSE and the associated error terms

y E(y|x) = 0 + 1x y4 . Basic idea of regression is to estimate the u4{ population parameters from a sample.

Let {(xi,yi): i = 1, …, n} denote a random }u y3 . 3 sample of size n from the population. y . 2 u2{ For each observation in this sample, it will be the case that y .} u1 yi = 0 + 1xi + ui. (2.9) 1

x1 x2 x3 x4 x Econometrics 9 Econometrics 10

Deriving OLSE using MM Cont. Deriving OLSE using MM

To derive the OLS estimates, we need to Since u = y – 0 – 1x, we can rewrite; realize that our main assumption of E(u) = E(y – 0 – 1x) = 0 (2.12)  E(u|x) = E(u) = 0 also implies that E(xu) = E[x(y – 0 – 1x)] = 0 (2.13)  Cov(x,u) = E(xu) = 0 These are called moment restrictions  Because Cov(X,Y) = E(XY) – E(X)E(Y) (B.27)  The approach to estimation implies imposing the Now we prepare 2 restrictions to estimate s. population moment restrictions on the sample  E(u) = 0 (2.10) moments. It , a sample estimator of E(X),  E(xu) = 0 (2.11) the mean of a population distribution, is simply the arithmetic mean of the sample.

Econometrics 11 Econometrics 12

Simple Regression Model 2 Chapter 02

More Derivation of OLS Cont. More Derivation of OLS Given the definition of a sample mean, and We want to choose values of the parameters properties of summation, we can rewrite the first that will ensure that the sample versions of condition as follows our moment restrictions are true ˆ ˆ ˆ ˆ y  0  1x (2.16) or 0  y  1x (2.17) The sample versions are as follows: n So the OLS estimated slope is n1 y  ˆ  ˆ x  0 (2.14) n i 0 1 i x  x y  y i1  i i ˆ i1 n 1  n (2.19) n1 x y  ˆ  ˆ x  0 (2.15) 2  i i 0 1 i xi  x i1 i1

Econometrics 13 Econometrics 14

Summary of OLS slope estimate More OLS

The slope estimate is the sample Intuitively, OLS is fitting a line through the covariance between x and y divided by the sample points such that the sum of squared sample of x. residuals is as small as possible, hence the If x and y are positively (negatively) term is called least squares. correlated, the slope will be positive The residual, û, is an estimate of the error (negative). term, u, and is the difference between the x needs to vary in our sample. fitted line (sample regression function) and  See (2.18) & Figure (2.3) the sample point.

Econometrics 15 Econometrics 16

Sample regression line, sample data points and the associated estimated error terms Alternate approach to derivation y y . Given the intuitive idea of fitting a line, we 4 û { 4 can set up a formal minimization problem. yˆ  ˆ  ˆ x n n 2 0 1 2 ˆ ˆ uˆi  yi  0  1xi  (2.22) }   y3 . û3 i1 i1 . y2 { û2 The first order conditions, which are the almost same as (2.14) & (2.15), û n n y .} 1 ˆ ˆ ˆ ˆ 1 yi  0  1xi  0,  xi yi  0  1xi  0 i1 i1 x1 x2 x3 x4 x Econometrics 17 Econometrics 18

Simple Regression Model 3 Chapter 02

2.3 Properties of OLS Cont. Algebraic Properties Algebraic Properties of OLS 2. The sample covariance between the regressors and the OLS residuals is zero 1. The sum of the OLS residuals is zero. n Thus, the sample average of the OLS ˆ  xiui  0 (2.31) residuals is zero as well. i1 n 1 n 3. The OLS regression line always goes ˆ ˆ  u i  0 and thus,  u i  0 (2.30) through the mean of the sample i 1 n i 1 ˆ ˆ y  0  1x

Econometrics 19 Econometrics 20

Cont. Algebraic Properties Goodness-of-Fit

We can think of each observation as being made It’s useful we think about how well the up of an explained part, and an unexplained part, sample regression line fits sample data.

yi  yˆi  uˆi (2.32) Then we define the following : From (2.36), 2 y  y  SST (2.33) 2 SSE SSR i R  1 (2.38). ˆ 2 SST SST yi  y  SSE (2.34) R2 indicates the fraction of the sample ˆ 2 ui  SSR (2.35) variation in yi that is explained by the Then, SST  SSE  SSR (2.36) model.

Econometrics 21 Econometrics 22

2.4 Measurement Units & Function Form 2.5 Means & Variance of OLSE If we use the model y* =  * +  *x*+ u* ˆ 0 1 Now, we view  i as estimators for the parameters instead of y = 0 + 1 x + u, we get i that appears in the population, which means properties of the distributions of  ˆ over different ˆ * ˆ ˆ * c ˆ i 0  c0 and 1  1 random samples from the population. d where y* = c y and x* = d x. Similarly, Unbiasedness of OLS Unbiased estimator: An estimator whose expected ˆ * y x ˆ y 1    1  value (or mean of its sampling distribution) equals x y x the population value (regardless of the population where y* = ln y and x* = ln x. value).

Econometrics 23 Econometrics 24

Simple Regression Model 4 Chapter 02

Cont. Unbiasedness of OLS Cont. Unbiasedness of OLS

Assumption for unbiasedness In order to think about unbiasedness, we 1. Linear in parameters as y =  +  x + u need to rewrite our estimator in terms of 0 1 the population . 2. Random sampling {(xi, yi): i = 1, 2, …, n},

Thus, y =  +  x + u xi  xyi xi  x ui i 0 1 i i ˆ       (2.49),(2.52) 1 (x  x)2 1 (x  x)2 3. Sample variation in the xi, thus  i  i 2 (x  x)  0 xi  x  i ˆ  then E1  1  2  E(ui | x)  1 (2.53) 4. Zero conditional mean, E(u|x) = 0 (xi  x) ˆ *we can also get E(0 )  0 in the same way.

Econometrics 25 Econometrics 26

Unbiasedness Summary Variances of the OLS Estimators

The OLS estimates of 1 and 0 are Now we know that the sampling unbiased. distribution of our estimate is centered Proof of unbiasedness depends on our 4 around the true parameter. assumptions – if any assumption fails, then  We want to think about how spread out this OLS is not necessarily unbiased. distribution is.

Remember unbiasedness is a description of  It is much easier to think about this variance the estimator – in a given sample our under an additional assumption, so assume estimate may be “near” or “far” from the 5. Var(u|x) = 2 (Homoskedasticity) true parameter.

Econometrics 27 Econometrics 28

Cont. Variance of OLSE Homoskedastic Case y 2  is also the unconditional variance, called f(y|x) the error variance, since 2 2  Var(u|x) = E(u |x) - [E(u|x)] .  2 2 2 E(u|x) = 0, so  = E(u |x) = E(u ) = Var(u) E(y|x) = 0 + 1x  And , the square root of the error variance, is . called the of the error. Then we can say 2 E(y|x)=0 + 1x and Var(y|x) =  x1 x2 Econometrics 29 Econometrics 30

Simple Regression Model 5 Chapter 02

Heteroskedastic Case Cont. Variance of OLSE f(y|x) 2 ˆ  Var ( 1 )  2 (2.57 )  ( xi  x ) . The larger the error variance, 2, the larger E(y|x) =  +  x . 0 1 the variance of the slope estimate. The larger the variability in the xi, the . smaller the variance of the slope estimate. As a result, a larger sample size should

x1 x2 x3 x decrease the variance of the slope estimate.

Econometrics 31 Econometrics 32

Estimating the Error Variance Cont. Error Variance Estimate

We don’t know what is the error variance, ˆ ˆ uˆi  yi  0  1xi 2, because we don’t observe the errors, u . i ˆ ˆ  0  1xi  ui  0  1xi ˆ ˆ What we observe are only the residuals, ûi,  ui  0  0  1  1 xi not the errors, u . i Then, an unbiased estimator of  2 is So we can use the residuals to form an 1 ˆ 2  uˆ 2 (2.61) estimate of the error variance. n  2  i

Econometrics 33 Econometrics 34

Cont. Error Variance Estimate 2.6 Regression through the Origin Now, consider the model without a intercept: 2 ˆ  ˆ  of the regression ~ ~ y  1x (2.63). ˆ ˆ recall that s.d.  Var( ) Solving the FOC to the minimization If we substitute ˆ for  , then we have problem, OLS estimated slope is ˆ x y the standard error of 1 , ~  i i 1  2 (2.66). 2 x ˆ  i se ˆ  1 2 * Recall that a intercept can always normalize E(u) (xi  x) to 0 in the model with 0.

Econometrics 35 Econometrics 36

Simple Regression Model 6