Chapter 02
Ch.2 The simple regression model
1. Definition of the simple regression model The Simple Regression Model 2. Deriving the OLS estimates 3. Mechanics of OLS 4. Units of measurement & functional form y = 0 + 1x + u 5. Expected values & variances of OLSE 6. Regression through the origin
Econometrics 1 Econometrics 2
2.1 Definition of the model
Equation (2.1), y = 0 + 1x + u, defines the Simple Regression model. In the model, we typically refer to
y as the Dependent Variable
x as the Independent Variable
s as parameters, and
u as the error term.
Econometrics 3 Econometrics 4
The Concept of Error Term A Simple Assumption for u
u represents factors other than x that affect y. The average value of u, the error term, in If the other factors in u are held fixed, so that the population is 0. That is, E(u) = 0.
u = 0, then y = 1x. This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0. Ex. 2.1: yield = 0 + 1fertilizer + u (2.3) u includes land quality, rainfall, etc. To draw ceteris paribus conclusions about Ex. 2.2: wage = + educ + u (2.4) how x affects y, we have to hold all other 0 1 factors (in u) fixed. u includes experience, ability, tenure, etc.
Econometrics 5 Econometrics 6
Simple Regression Model 1 Chapter 02
E(y|x) as a linear function of x, where for any x Zero Conditional Mean the distribution of y is centered about E(y|x) y We need to make a crucial assumption about how u and x are related. f(y) We want it to be the case that knowing something about x does not give us any . E(y|x) = + x information about u, so that they are 0 1 completely unrelated. That is, that . E(u|x) = E(u) = 0 (2.5&2.6), which implies
E(y|x) = 0 + 1x (PRF) (2.8)
x1 x2 x Econometrics 7 Econometrics 8
Population regression line, sample data points 2.2 Deriving the OLSE and the associated error terms
y E(y|x) = 0 + 1x y4 . Basic idea of regression is to estimate the u4{ population parameters from a sample.
Let {(xi,yi): i = 1, …, n} denote a random }u y3 . 3 sample of size n from the population. y . 2 u2{ For each observation in this sample, it will be the case that y .} u1 yi = 0 + 1xi + ui. (2.9) 1
x1 x2 x3 x4 x Econometrics 9 Econometrics 10
Deriving OLSE using MM Cont. Deriving OLSE using MM
To derive the OLS estimates, we need to Since u = y – 0 – 1x, we can rewrite; realize that our main assumption of E(u) = E(y – 0 – 1x) = 0 (2.12) E(u|x) = E(u) = 0 also implies that E(xu) = E[x(y – 0 – 1x)] = 0 (2.13) Cov(x,u) = E(xu) = 0 These are called moment restrictions Because Cov(X,Y) = E(XY) – E(X)E(Y) (B.27) The approach to estimation implies imposing the Now we prepare 2 restrictions to estimate s. population moment restrictions on the sample E(u) = 0 (2.10) moments. It means, a sample estimator of E(X), E(xu) = 0 (2.11) the mean of a population distribution, is simply the arithmetic mean of the sample.
Econometrics 11 Econometrics 12
Simple Regression Model 2 Chapter 02
More Derivation of OLS Cont. More Derivation of OLS Given the definition of a sample mean, and We want to choose values of the parameters properties of summation, we can rewrite the first that will ensure that the sample versions of condition as follows our moment restrictions are true ˆ ˆ ˆ ˆ y 0 1x (2.16) or 0 y 1x (2.17) The sample versions are as follows: n So the OLS estimated slope is n1 y ˆ ˆ x 0 (2.14) n i 0 1 i x x y y i1 i i ˆ i1 n 1 n (2.19) n1 x y ˆ ˆ x 0 (2.15) 2 i i 0 1 i xi x i1 i1
Econometrics 13 Econometrics 14
Summary of OLS slope estimate More OLS
The slope estimate is the sample Intuitively, OLS is fitting a line through the covariance between x and y divided by the sample points such that the sum of squared sample variance of x. residuals is as small as possible, hence the If x and y are positively (negatively) term is called least squares. correlated, the slope will be positive The residual, û, is an estimate of the error (negative). term, u, and is the difference between the x needs to vary in our sample. fitted line (sample regression function) and See (2.18) & Figure (2.3) the sample point.
Econometrics 15 Econometrics 16
Sample regression line, sample data points and the associated estimated error terms Alternate approach to derivation y y . Given the intuitive idea of fitting a line, we 4 û { 4 can set up a formal minimization problem. yˆ ˆ ˆ x n n 2 0 1 2 ˆ ˆ uˆi yi 0 1xi (2.22) } y3 . û3 i1 i1 . y2 { û2 The first order conditions, which are the almost same as (2.14) & (2.15), û n n y .} 1 ˆ ˆ ˆ ˆ 1 yi 0 1xi 0, xi yi 0 1xi 0 i1 i1 x1 x2 x3 x4 x Econometrics 17 Econometrics 18
Simple Regression Model 3 Chapter 02
2.3 Properties of OLS Cont. Algebraic Properties Algebraic Properties of OLS 2. The sample covariance between the regressors and the OLS residuals is zero 1. The sum of the OLS residuals is zero. n Thus, the sample average of the OLS ˆ xiui 0 (2.31) residuals is zero as well. i1 n 1 n 3. The OLS regression line always goes ˆ ˆ u i 0 and thus, u i 0 (2.30) through the mean of the sample i 1 n i 1 ˆ ˆ y 0 1x
Econometrics 19 Econometrics 20
Cont. Algebraic Properties Goodness-of-Fit
We can think of each observation as being made It’s useful we think about how well the up of an explained part, and an unexplained part, sample regression line fits sample data.
yi yˆi uˆi (2.32) Then we define the following : From (2.36), 2 y y SST (2.33) 2 SSE SSR i R 1 (2.38). ˆ 2 SST SST yi y SSE (2.34) R2 indicates the fraction of the sample ˆ 2 ui SSR (2.35) variation in yi that is explained by the Then, SST SSE SSR (2.36) model.
Econometrics 21 Econometrics 22
2.4 Measurement Units & Function Form 2.5 Means & Variance of OLSE If we use the model y* = * + *x*+ u* ˆ 0 1 Now, we view i as estimators for the parameters instead of y = 0 + 1 x + u, we get i that appears in the population, which means properties of the distributions of ˆ over different ˆ * ˆ ˆ * c ˆ i 0 c0 and 1 1 random samples from the population. d where y* = c y and x* = d x. Similarly, Unbiasedness of OLS Unbiased estimator: An estimator whose expected ˆ * y x ˆ y 1 1 value (or mean of its sampling distribution) equals x y x the population value (regardless of the population where y* = ln y and x* = ln x. value).
Econometrics 23 Econometrics 24
Simple Regression Model 4 Chapter 02
Cont. Unbiasedness of OLS Cont. Unbiasedness of OLS
Assumption for unbiasedness In order to think about unbiasedness, we 1. Linear in parameters as y = + x + u need to rewrite our estimator in terms of 0 1 the population parameter. 2. Random sampling {(xi, yi): i = 1, 2, …, n},
Thus, y = + x + u xi xyi xi x ui i 0 1 i i ˆ (2.49),(2.52) 1 (x x)2 1 (x x)2 3. Sample variation in the xi, thus i i 2 (x x) 0 xi x i ˆ then E1 1 2 E(ui | x) 1 (2.53) 4. Zero conditional mean, E(u|x) = 0 (xi x) ˆ *we can also get E(0 ) 0 in the same way.
Econometrics 25 Econometrics 26
Unbiasedness Summary Variances of the OLS Estimators
The OLS estimates of 1 and 0 are Now we know that the sampling unbiased. distribution of our estimate is centered Proof of unbiasedness depends on our 4 around the true parameter. assumptions – if any assumption fails, then We want to think about how spread out this OLS is not necessarily unbiased. distribution is.
Remember unbiasedness is a description of It is much easier to think about this variance the estimator – in a given sample our under an additional assumption, so assume estimate may be “near” or “far” from the 5. Var(u|x) = 2 (Homoskedasticity) true parameter.
Econometrics 27 Econometrics 28
Cont. Variance of OLSE Homoskedastic Case y 2 is also the unconditional variance, called f(y|x) the error variance, since 2 2 Var(u|x) = E(u |x) - [E(u|x)] . 2 2 2 E(u|x) = 0, so = E(u |x) = E(u ) = Var(u) E(y|x) = 0 + 1x And , the square root of the error variance, is . called the standard deviation of the error. Then we can say 2 E(y|x)=0 + 1x and Var(y|x) = x1 x2 Econometrics 29 Econometrics 30
Simple Regression Model 5 Chapter 02
Heteroskedastic Case Cont. Variance of OLSE f(y|x) 2 ˆ Var ( 1 ) 2 (2.57 ) ( xi x ) . The larger the error variance, 2, the larger E(y|x) = + x . 0 1 the variance of the slope estimate. The larger the variability in the xi, the . smaller the variance of the slope estimate. As a result, a larger sample size should
x1 x2 x3 x decrease the variance of the slope estimate.
Econometrics 31 Econometrics 32
Estimating the Error Variance Cont. Error Variance Estimate
We don’t know what is the error variance, ˆ ˆ uˆi yi 0 1xi 2, because we don’t observe the errors, u . i ˆ ˆ 0 1xi ui 0 1xi ˆ ˆ What we observe are only the residuals, ûi, ui 0 0 1 1 xi not the errors, u . i Then, an unbiased estimator of 2 is So we can use the residuals to form an 1 ˆ 2 uˆ 2 (2.61) estimate of the error variance. n 2 i
Econometrics 33 Econometrics 34
Cont. Error Variance Estimate 2.6 Regression through the Origin Now, consider the model without a intercept: 2 ˆ ˆ Standard error of the regression ~ ~ y 1x (2.63). ˆ ˆ recall that s.d. Var( ) Solving the FOC to the minimization If we substitute ˆ for , then we have problem, OLS estimated slope is ˆ x y the standard error of 1 , ~ i i 1 2 (2.66). 2 x ˆ i se ˆ 1 2 * Recall that a intercept can always normalize E(u) (xi x) to 0 in the model with 0.
Econometrics 35 Econometrics 36
Simple Regression Model 6