Definition of the Simple Regression Model

Deriving the The Simple Regression Model Ordinary Least Squares Estimates

Properties of OLS on any Caio Vigo Sample of Data

Units of Measurement The University of Kansas and Functional Department of Economics Form Using the Natural Logarithm in Simple Regression Fall 2019 Expected Value of OLS

These slides were based on Introductory by Jeffrey M. Wooldridge (2015)

1 / 86 Topics

Definition of the Simple Regression Model 1 Deriving the Definition of the Simple Regression Model Ordinary Least Squares Estimates 2 Deriving the Ordinary Least Squares Estimates Properties of OLS on any Sample of 3 Data Properties of OLS on any Sample of Data

Units of Measurement 4 and Functional Units of Measurement and Functional Form Form Using the Natural Logarithm in Simple Regression Using the Natural Logarithm in Simple Regression Expected 5 Expected Value of OLS Value of OLS

2 / 86 Definition of the Simple Regression Model

Definition of the Simple Regression • What type of analysis will we do? Cross-sectional analysis Model

Deriving the Ordinary Least • First step: Clearly define what is your population (in what you are interested Squares Estimates to study). Properties of OLS on any Sample of • x y Data Second Step: There are two variables, and , and we would like to “study

Units of how y varies with changes in x.” Measurement and Functional Form • Using the Natural Third Step: We assume we can collect a random sample from the population Logarithm in Simple Regression of interest. Expected Value of OLS Now we will learn to write our first econometric model, derive an estimator (what’s an estimator again?) and use this estimator in our sample.

3 / 86 Introduction

Definition of the Simple Regression Model

Deriving the Ordinary Least We must confront three issues: Squares Estimates 1 How do we allow factors other than x to affect y? There is never an exact Properties of relationship between two variables. OLS on any Sample of Data 2 Units of What is the functional relationship between y and x? Measurement and Functional Form 3 ceteris paribus y Using the Natural How can we be sure we are capturing a relationship between Logarithm in Simple Regression and x? Expected Value of OLS

4 / 86 Introduction

Definition of the Simple Regression Model

Deriving the Ordinary Least Consider the following equation relating y to x: Squares Estimates Properties of y = β0 + β1x + u, OLS on any Sample of Data which is assumed to hold in the population of interest. Units of Measurement and Functional • This equation defines the simple model (or two-variable Form Using the Natural regression model, or bivariate linear regression model). Logarithm in Simple Regression

Expected Value of OLS

5 / 86 Introduction

Definition of the Simple Regression Model • y and x are not treated symmetrically. We want to explain y in terms of x.

Deriving the Ordinary Least Squares Estimates x explains y Properties of OLS on any Sample of Data

Units of Measurement x −→ y and Functional Form Using the Natural Logarithm in Simple Regression • Example: Expected Value of OLS size of the city x, explains number of crimes (y) (not the other way around).

6 / 86 Terminology for Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares y x Estimates

Properties of Dependent Variable Independent Variable OLS on any Sample of Explained Variable Explanatory Variable Data Response Variable Control Variable Units of Measurement Predicted Variable Predictor Variable and Functional Form Regressand Regressor Using the Natural Logarithm in Simple Regression

Expected Value of OLS

7 / 86 The error term

Definition of the Simple Regression Model

Deriving the Ordinary Least y = β0 + β1x + u Squares Estimates Properties of u y OLS on any This equation explicitly allows for other factors, contained in , to affect . Sample of Data This equation also addresses the functional form issue (in a simple way). Units of Measurement and Functional Form Namely, y is assumed to be linearly related to x. Using the Natural Logarithm in Simple Regression β β Expected We call 0 the intercept and 1 the slope parameter. These describe Value of OLS a population, and our ultimate goal is to estimate them.

8 / 86 The model equation

Definition of the Simple • The equation also addresses the ceteris paribus issue. In Regression Model Deriving the y = β0 + β1x + u, Ordinary Least Squares Estimates all other factors that affect y are in u. We want to know how y changes when x Properties of changes, holding u fixed. OLS on any Sample of • Let ∆ denote “change.”Then holding u fixed means ∆u = 0. So Data

Units of Measurement and Functional Form ∆y = β1∆x + ∆u Using the Natural Logarithm in Simple = β ∆x ∆u = 0 Regression 1 when .

Expected Value of OLS

• This equation effectively defines β1 as a slope, with the only difference being the restriction ∆u = 0. 9 / 86 The simple linear regression model equation

Definition of the Simple Regression Model

Deriving the Ordinary Least Example: Yield and Fertilizer Squares Estimates • A model to explain crop yield to fertilizer use is Properties of OLS on any Sample of yield = β0 + β1fertilizer + u, Data

Units of Measurement and Functional where u contains land quality, rainfall on a plot of land, and so on. The slope Form Using the Natural parameter, β1, is of primary interest: it tells us how yield changes when the amount Logarithm in Simple Regression of fertilizer changes, holding all else fixed. Expected Value of OLS

10 / 86 The simple linear regression model equation

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Example: Wage and Education Estimates Properties of wage = β0 + β1educ + u OLS on any Sample of Data where u contains somewhat nebulous factors (“ability”) but also past workforce Units of experience and tenure on the current job. Measurement and Functional Form Using the Natural ∆wage = β1∆educ when ∆u = 0 Logarithm in Simple Regression

Expected Value of OLS

11 / 86 The simple linear regression model equation

Definition of the Simple We said we must confront three issues: Regression Model 1. How do we allow factors other than x to affect y? Deriving the Answer: u Ordinary Least Squares Estimates 2. What is the functional relationship between y and x? Properties of OLS on any Answer: Linear (x has a linear effect on y) Sample of Data 3. How can we be sure we a capturing a ceteris paribus relationship between y and Units of Measurement x? and Functional Form Answer: Related with ∆u = 0 Using the Natural Logarithm in Simple Regression • We have argued that the simple regression model Expected Value of OLS y = β0 + β1x + u addresses each of them. 12 / 86 Relation between u and x

Definition of the Simple Regression Model To estimate β1 and β0 from a random sample we also need to restrict how u and Deriving the x are related to each other. Ordinary Least Squares Estimates Properties of • Recall that x and u are properly viewed as having distributions in the population. OLS on any Sample of Data • What we must do is restrict the way in when u and x relate to each other in the Units of Measurement population. and Functional Form Using the Natural • Logarithm in Simple First, we make a simplifying assumption that is without loss of generality: the Regression average, or expected, value of u is zero in the population: Expected Value of OLS E(u) = 0

13 / 86 Relation between u and x

Definition of the Simple Regression • Normalizing u should cause no impact in the most important parameter: β1 Model

Deriving the Ordinary Least • The presence of β0 in Squares Estimates Properties of y = β0 + β1x + u OLS on any Sample of Data allows us to assume E(u) = 0. Units of Measurement and Functional • If the average of u is different from zero, we just adjust the intercept, leaving the Form Using the Natural slope the same. If α0 = E(u) then we can write Logarithm in Simple Regression Expected y = (β + α ) + β x + (u − α ), Value of OLS 0 0 1 0

where the new error, u − α0, has a zero mean.

14 / 86 Relation between u and x

Definition of the Simple Regression Model Deriving the We need to restrict the dependence between u and x Ordinary Least Squares Estimates Properties of • OLS on any Option 1: Uncorrelated Sample of Data We could assume u and x uncorrelated in the population: Units of Measurement and Functional Form Corr(x, u) = 0 Using the Natural Logarithm in Simple Regression

Expected Value of OLS It implies only that u and x are not linearly related. Not good enough.

15 / 86 Relation between u and x

Definition of the Simple Regression Model • Option 2: Mean independence Deriving the Ordinary Least Squares The mean of the error (i.e., the mean of the unobservables) is the same across all Estimates x Properties of slices of the population determined by values of . OLS on any Sample of Data We represent it by: Units of Measurement and Functional E(u|x) = E(u), all values x, Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS And we say that u is mean independent of x

16 / 86 Relation between u and x

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares • Suppose u is “ability” and x is years of education. We need, for example, Estimates

Properties of OLS on any E(ability|x = 8) = E(ability|x = 12) = E(ability|x = 16) Sample of Data

Units of Measurement and Functional so that the average ability is the same in the different portions of the population with Form an 8th grade education, a 12th grade education, and a four-year college education. Using the Natural Logarithm in Simple Regression

Expected Value of OLS

17 / 86 Relation between u and x

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares • Combining E(u|x) = E(u) (the substantive assumption) with E(u) = 0 (a Estimates normalization) gives Properties of OLS on any Sample of Data E(u|x) = 0, all values x

Units of Measurement and Functional Form Using the Natural • Logarithm in Simple Called the zero conditional mean assumption. Regression

Expected Value of OLS

18 / 86 The PRF

Definition of the Simple Regression Model

Deriving the • First, recall the properties of conditional expectation. (see slides with a review of Ordinary Least Squares Probability) Estimates

Properties of OLS on any • Now, take the conditional expectation of our Simple Linear Regression Function. Sample of Data Then, we get:

Units of Measurement E(y|x) = β + β x + E(u|x) and Functional 0 1 Form = β0 + β1x Using the Natural Logarithm in Simple Regression

Expected which shows the population regression function is a linear function of x. Value of OLS

19 / 86 The PRF

Definition of the Simple Regression Model Figure: Example: The goal is to explain weekly consumption expenditure in terms of Deriving the Ordinary Least weekly income Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics.

20 / 86 The PRF

Definition of the Simple Regression Model Figure: Conditional Probabilities of the data Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics.

21 / 86 The PRF

Definition of the Simple Regression Model Figure: Conditional distribution of expenditure for various levels of income

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics. 22 / 86 The PRF

Definition of the Simple Regression Model Figure: The Population Regression Function (PRF) Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics.

23 / 86 The PRF

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares • The straight line in the previous graph is the PRF, E(y|x) = β0 + β1x. The Estimates conditional distribution of y at three different values of x are superimposed. Properties of OLS on any Sample of Data • For a given value of x, we see a range of y values: remember, y = β0 + β1x + u, Units of and u has a distribution in the population. Measurement and Functional Form • Using the Natural In practice, we never know the population intercept and slope. Logarithm in Simple Regression

Expected Value of OLS

24 / 86 The PRF

Definition of the Simple Regression Model • Assuming we know the PRF, consider this example: Deriving the Ordinary Least Squares Estimates Example Properties of OLS on any • Suppose for the population of students attending a university, we know the PRF: Sample of Data

Units of E(colGPA|hsGP A) = 1.5 + 0.5 hsGP A, Measurement and Functional Form Using the Natural Logarithm in Simple • y x Regression For this example, what is ? what is ? What is the slope? What’s the intercept?

Expected Value of OLS • If hsGP A = 3.6 what’s the expected college GPA? 1.5 + 0.5(3.6) = 3.3

25 / 86 Topics

Definition of the Simple Regression Model 1 Deriving the Definition of the Simple Regression Model Ordinary Least Squares Estimates 2 Deriving the Ordinary Least Squares Estimates Properties of OLS on any Sample of 3 Data Properties of OLS on any Sample of Data

Units of Measurement 4 and Functional Units of Measurement and Functional Form Form Using the Natural Logarithm in Simple Regression Using the Natural Logarithm in Simple Regression Expected 5 Expected Value of OLS Value of OLS

26 / 86 Deriving the OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates • Given data on x and y, how can we estimate the population , β0 and Properties of β1? OLS on any Sample of Data • Let {(xi, yi): i = 1, 2, ..., n} be a random sample of size n (the number of Units of Measurement observations) from the population. Think of this as a random sample. and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

27 / 86 Deriving the OLS

Definition of the Simple Regression Derivation: (On white board) Model

Deriving the Estimator for β0 Ordinary Least Squares ˆ ˆ Estimates β0 =y ¯ − β1x¯

Properties of OLS on any Sample of Data Estimator for β1 Units of Measurement Pn (xi − x¯)(yi − y¯) Sample Covariance(x, y) and Functional βˆ = i=1 = Form 1 Pn (x − x¯)2 (x) Using the Natural i=1 i Sample Variance Logarithm in Simple S Regression x,y = 2 Expected Sx Value of OLS σˆy =ρ ˆx,y σˆx

28 / 86 Deriving the OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

29 / 86 Deriving the OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

30 / 86 Deriving the OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

31 / 86 Deriving the OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics.

32 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Example: Effects of Education on Hourly Wage Model

Deriving the • Data: random sample from the US workforce population in 1976. Ordinary Least Squares wage: dollars per hour, Estimates educ: highest grade completed (years of education). Properties of OLS on any Sample of • Data The estimated equation is

Units of Measurement and Functional Form wage[ = −0.90 + 0.54 educ Using the Natural Logarithm in Simple Regression n = 526

Expected Value of OLS • Each additional year of schooling is estimated to be worth $0.54.

33 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates Properties of The function OLS on any Sample of wage[ = −0.90 + 0.54 educ Data Units of is the OLS (or sample) regression line. Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

34 / 86 Interpreting the OLS Estimates - R Output

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

35 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model Deriving the • When we write the simple linear regression model, Ordinary Least Squares Estimates wage = β0 + β1educ + u, Properties of OLS on any Sample of it applies to the population, so we do not know β0 and β1. Data Units of ˆ ˆ Measurement • β0 = −0.90 and β1 = 0.54 are our estimates from this particular sample. and Functional Form Using the Natural Logarithm in Simple • These estimates may or may not be close to the population values. If we obtain Regression another sample, the estimates would almost certainly change. Expected Value of OLS

36 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model Deriving the • If educ = 0, Ordinary Least Squares the predicted wage is: Estimates

Properties of OLS on any wage = −0.90 + 0.54(0) = −0.90 Sample of [ Data

Units of Measurement The predicted value does not fit in reality. and Functional Form Using the Natural Logarithm in Simple Mainly because when we extrapolate outside the majority range of our data can Regression

Expected produce strange predictions. Value of OLS

37 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares • When educ = 8, Estimates the predicted wage is: Properties of OLS on any Sample of Data wage[ = −0.90 + 0.54(8) = 3.42 Units of Measurement and Functional Form which we can think of as our estimate of the average wage in the population when Using the Natural Logarithm in Simple educ = 8. Regression

Expected Value of OLS

38 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model Deriving the Sample Regression Line (SRF) Ordinary Least Squares Estimates yˆi = βˆ0 + βˆ1xi i = 1, . . . , n Properties of OLS on any Sample of Also known as: Data • OLS Regression Line Units of Measurement • and Functional Sample Regression Function Form • Using the Natural OLS Regression Function Logarithm in Simple Regression • Estimated Equation Expected Value of OLS

39 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model

Deriving the Ordinary Least Population Regression Function (PRF) Squares Estimates Since the simple linear regression model (or just econometric model) is: Properties of OLS on any Sample of Data yi = β0 + β1xi + u

Units of Measurement Then, the PRF is: and Functional Form Using the Natural Logarithm in Simple ⇒ E(yi|x) = β0 + β1xi i = 1, 2, . . . , n Regression

Expected Value of OLS

40 / 86 Interpreting the OLS Estimates

Definition of the Simple Regression Model

Deriving the Ordinary Least Residuals Squares Estimates uˆi = yi − yˆi i = 1, 2, . . . , n Properties of OLS on any Sample of Data Units of Error Term Measurement and Functional Form ui = yi − E(y|x) Using the Natural Logarithm in Simple = yi − β0 − β1xi i = 1, 2, . . . , n Regression

Expected Value of OLS

41 / 86 Topics

Definition of the Simple Regression Model 1 Deriving the Definition of the Simple Regression Model Ordinary Least Squares Estimates 2 Deriving the Ordinary Least Squares Estimates Properties of OLS on any Sample of 3 Data Properties of OLS on any Sample of Data

Units of Measurement 4 and Functional Units of Measurement and Functional Form Form Using the Natural Logarithm in Simple Regression Using the Natural Logarithm in Simple Regression Expected 5 Expected Value of OLS Value of OLS

42 / 86 Properties of OLS on any Sample of Data

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of • Recall that the OLS residuals are OLS on any Sample of Data ˆ ˆ uˆi = yi − yˆi = yi − β0 − β1xi , i = 1, 2, ..., n Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

43 / 86 Properties of OLS on any Sample of Data

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics.

44 / 86 Properties of OLS on any Sample of Data

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates • Some residuals are positive, others are negative. Properties of OLS on any Sample of • uˆ ⇒ y Data If i is positive the line underpredicts i

Units of Measurement • If uˆi is negative ⇒ the line overpredicts yi and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

45 / 86 Algebraic Properties of OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of (1) The sum of the OLS residuals is 0 OLS on any n Sample of X Data uˆi = 0 Units of i=1 Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

46 / 86 Algebraic Properties of OLS Statistics

Definition of the Simple Regression Model (2) The sample covariance between the explanatory variables and the residuals is

Deriving the always zero Ordinary Least n Squares X Estimates xiuˆi = 0 Properties of i=1 OLS on any Sample of Data • Therefore the sample correlation between the x and uˆi is also equal to zero. Units of Measurement and Functional • Because the yˆi are linear functions of the xi, the fitted values and residuals are Form Using the Natural uncorrelated, too: Logarithm in Simple Regression n Expected X Value of OLS yˆiuˆi = 0 i=1

47 / 86 Algebraic Properties of OLS Statistics

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates (3) The point (¯x, y¯) is always on the OLS regression line. Properties of OLS on any ˆ ˆ Sample of y¯ = β0 + β1x¯ Data

Units of Measurement and Functional • That is, if we plug in the average for x, we predict the sample average for y. Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

48 / 86 Goodness-of-Fit

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics. 49 / 86 Goodness-of-Fit

Definition of the Simple Regression Model Goodness-of-Fit Deriving the Ordinary Least Squares Estimates • For each observation, write

Properties of OLS on any Sample of yi =y ˆi +u ˆi Data

Units of • Define: Measurement and Functional Pn 2 Form Total Sum of Squares = SST = i=1(yi − y¯) Using the Natural Pn 2 Logarithm in Simple = SSE = (ˆy − y¯) Regression Explained Sum of Squares i=1 i Pn 2 Expected Residual Sum of Squares = SSR = i=1 uˆi Value of OLS

50 / 86 Goodness-of-Fit

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

Source: Gujarati, Damodar (2002). Basic Econometrics. 51 / 86 Goodness-of-Fit

Definition of the Simple Regression Model

Deriving the Ordinary Least (Other names) Squares Estimates

Properties of • SSR is also know as Sum of Squared Residuals or Model Sum of Residuals OLS on any Sample of Data • SST = TSS Units of Measurement and Functional • SSE = ESS Form Using the Natural Logarithm in Simple Regression • SSR = RSS Expected Value of OLS

52 / 86 Goodness-of-Fit

Definition of the Simple Regression Model n Deriving the X 2 Ordinary Least SST = (yi − y¯) Squares Estimates i=1 n Properties of X 2 OLS on any = [(yi − yˆi) + (ˆyi − y¯)] Sample of Data i=1 n Units of X 2 Measurement = [ˆui − (ˆyi − y¯)] and Functional Form i=1 Using the Natural Logarithm in Simple Regression Using the fact that the fitted values and residuals are uncorrelated: Expected Value of OLS SST = SSE + SSR

53 / 86 Goodness-of-Fit

Definition of the Simple Regression Model The R-Squared Deriving the Ordinary Least Goal: We want to evaluate how well the independet variable x explains the Squares Estimates dependent variable y.

Properties of OLS on any • y x Sample of We want to obtain the fraction of the sample variation in that is explained by . Data 2 Units of • We will summarize it in one number: R (or coefficient of determination.) Measurement and Functional Form • SST > 0 Using the Natural Assuming , Logarithm in Simple Regression SSE SSR Expected R2 = = 1 − Value of OLS SST SST

54 / 86 Goodness-of-Fit

Definition of the Simple Regression • SSE SST Model Since cannot be greater than the , then:

Deriving the Ordinary Least 0 ≤ R2 ≤ 1 Squares Estimates

Properties of OLS on any • R2 = 0 ⇒ (between y and x ) Sample of No linear relationship i i . Data 2 Units of • R = 1 ⇒ Perfect linear relationship yi xi Measurement (between and ). and Functional Form 2 Using the Natural • As R increases ⇒ yi gets closer and closer to the OLS regression line. Logarithm in Simple Regression

Expected Value of OLS We should not focus only on R2 to analyze our regression.

55 / 86 Goodness-of-Fit

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Example (Wage) Estimates

Properties of OLS on any Sample of wage[ = −0.90 + 0.54 educ Data n = 526,R2 = .16 Units of Measurement and Functional Form • Therefore, years of education explains only about 16% of the variation in hourly Using the Natural Logarithm in Simple wage. Regression

Expected Value of OLS

56 / 86 Goodness-of-Fit - R output

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

57 / 86 Goodness-of-Fit - R output (using stargazer)

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

58 / 86 Suggested Exercise

Definition of the Simple Regression Model Suggested Exercise Deriving the Below you have a random sample with 10 data points. Your observations are (xi, yi). Ordinary Least ˆ ˆ 2 Squares Find the β0, β1 and the R . Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

59 / 86 Topics

Definition of the Simple Regression Model 1 Deriving the Definition of the Simple Regression Model Ordinary Least Squares Estimates 2 Deriving the Ordinary Least Squares Estimates Properties of OLS on any Sample of 3 Data Properties of OLS on any Sample of Data

Units of Measurement 4 and Functional Units of Measurement and Functional Form Form Using the Natural Logarithm in Simple Regression Using the Natural Logarithm in Simple Regression Expected 5 Expected Value of OLS Value of OLS

60 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model Example Deriving the Ordinary Least salary: Annual CEO’s salary in thousands of dollars Squares Estimates roe: Average return on equity (measured in percentage) Properties of OLS on any Sample of Data Units of salary\ = 963.19 + 18.50 roe Measurement and Functional 2 Form n = 209,R = .01 Using the Natural Logarithm in Simple Regression

Expected Value of OLS • A one unit increase in the independent variable (i.e. roe increases one percent) ⇒ increases the predicted salary by 18.501, or $18,501.

61 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model

Deriving the • If we measure roe as a decimal (rather than a percent), what will happen to the Ordinary Least 2 Squares intercept, slope, and R ? Estimates We want: Properties of OLS on any roedec = roe/100 Sample of Data Units of • Measurement What if we measure salary in dollars (rather than thousands of dollars)? what will and Functional R2 Form happen to the intercept, slope, and ? Using the Natural Logarithm in Simple We want: Regression salarydol = 1, 000 · salary Expected Value of OLS

62 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model Deriving the Changing Units of Measurement Ordinary Least Squares • If the dependent variable y Estimates c ⇒ c · βˆ c · βˆ Properties of is multiplied by a constant 0 and 1 OLS on any Sample of Data • If the independent variable x 1 Units of ˆ Measurement is multiplied by a constant c ⇒ · β1 and Functional c Form Using the Natural Logarithm in Simple In general, changing the units of measurement of only the independent variable does Regression not affect the intercept Expected Value of OLS

63 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model Example: CEO’s salary - Original Regression Deriving the Ordinary Least Squares Estimates salary\ = 963.19 + 18.50 roe Properties of 2 OLS on any n = 209,R = .01 Sample of Data Units of Example: CEO’s salary - roe as a decimal Measurement and Functional Form The new regression is: Using the Natural Logarithm in Simple Regression salary\ = 963.191 + 1, 850.1 roedec Expected 2 Value of OLS n = 209,R = .01

64 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

65 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model Example: CEO’s salary - Original Regression Deriving the Ordinary Least Squares Estimates salary\ = 963.19 + 18.50 roe Properties of 2 OLS on any n = 209,R = .01 Sample of Data Units of Example: CEO’s salary - salary in dollars Measurement and Functional Form The new regression is Using the Natural Logarithm in Simple Regression salarydol\ = 963, 191 + 18, 501 roe Expected 2 Value of OLS n = 209,R = .01

66 / 86 The effects of Changing Units of Measurement on OLS Statistics

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

67 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple • Regression Recall the wage example: Model

Deriving the Example (Wage) Ordinary Least Squares Estimates Properties of wage[ = −0.90 + 0.54 educ OLS on any 2 Sample of n = 526,R = .16 Data

Units of Measurement • Now, think about the econometric model and how this OLS Regression Function is and Functional Form interpreted. Using the Natural Logarithm in Simple Regression • What the OLS Regression Line says may not fit how economically we see the Expected Value of OLS problem.

Possible issue: the dollar value of another year of schooling is constant.

68 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the th Ordinary Least • So the 16 year of education is worth the same as the second. Squares Estimates • Properties of We expect additional years of schooling to be worth more, in dollar terms, than OLS on any Sample of previous years. Data Units of • How can we incorporate an increasing effect? One way is to postulate a constant Measurement and Functional percentage effect. Form Using the Natural Logarithm in Simple Regression •We can approximate percentage changes using the natural log. Expected Value of OLS

69 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Constant Percent Model Squares Estimates

Properties of • Let the dependent variable be log(wage) and write a (new) simple linear OLS on any Sample of regression model: Data

Units of Measurement log(wage) = β0 + β1educ + u and Functional Form Using the Natural Logarithm in Simple Regression • Let’s define log(wage) (write it as lwage) and run a new regression. Expected Value of OLS

70 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

71 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least lwage\ = 0.58 + .08 educ Squares Estimates n = 526, R2 = .19 Properties of OLS on any Sample of Data

Units of Measurement • The estimated return to each year of education is about 8%. and Functional Form Using the Natural Logarithm in Simple • Attention: Regression This R-squared is not directly comparable to the R-squared when wage is the Expected Value of OLS dependent variable. The total variation (SSTs) in wagei and lwagei that we must explain are completely different.

72 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Constant Elasticity Model Regression Model Deriving the • We can use the log on both sides of the equation to get constant elasticity Ordinary Least Squares models. For example, if Estimates

Properties of OLS on any log(salary) = β0 + β1 log(sales) + u Sample of Data then Units of Measurement and Functional %∆salary Form β1 ≈ Using the Natural Logarithm in Simple %∆sales Regression

Expected Value of OLS • The elasticity is free of units of salary and sales. • A constant elasticity model for salary and sales makes more sense than a constant dollar effect. 73 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates Model Dependent Variable Independent Variable Interpretation of β1 Properties of Level-Level y x ∆y = β1∆x OLS on any Sample of Level-Log y log(x) ∆y = (β1/100)%∆x Data log(y) x %∆y = (100β1)∆x Units of Log-Level Measurement log(y) log(x) %∆y = β1%∆x and Functional Log-Log Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

74 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model • Recall the CEO salary example, but now the independent variable is sales.

Deriving the Ordinary Least Squares salary = βo + β1sales + u Estimates

Properties of OLS on any Sample of Data • Applying log on both variables (dependent and independent) we get: Units of Measurement and Functional Example (CEO salary) Form Using the Natural Logarithm in Simple Regression log\(salary) = 4.82 + 0.26 log(sales) Expected Value of OLS n = 209,R2 = .21

75 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Sample of Data

Units of Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

76 / 86 Using the Natural Logarithm in Simple Regression

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares • Estimates The estimated elasticity of CEO salary with respect to firms sales is about .26.

Properties of OLS on any • A 10 percent increase in sales is associated with a Sample of Data Units of .26(10) = 2.6 Measurement and Functional Form percent increase in salary. Using the Natural Logarithm in Simple Regression

Expected Value of OLS

77 / 86 Topics

Definition of the Simple Regression Model 1 Deriving the Definition of the Simple Regression Model Ordinary Least Squares Estimates 2 Deriving the Ordinary Least Squares Estimates Properties of OLS on any Sample of 3 Data Properties of OLS on any Sample of Data

Units of Measurement 4 and Functional Units of Measurement and Functional Form Form Using the Natural Logarithm in Simple Regression Using the Natural Logarithm in Simple Regression Expected 5 Expected Value of OLS Value of OLS

78 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates

Properties of OLS on any Goal: We want to study statistical properties of the OLS estimator Sample of Data

Units of Measurement • In order to that, we will need to impose 4 assumptions. and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

79 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model

Deriving the Assumption SLR.1 (Linear in Parameters) Ordinary Least Squares The population model can be written as Estimates

Properties of OLS on any y = β0 + β1x + u Sample of Data β β Units of where 0 and 1 are the (unknown) population parameters. Measurement and Functional Form Using the Natural Logarithm in Simple • What linear in parameters mean? Regression

Expected Value of OLS • Example of non linear in parameters on white board

80 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates Properties of Assumption SLR.2 (Random Sampling) OLS on any Sample of n {(x , y ): i = 1, ..., n} Data We have a random sample of size , i i , following the

Units of population model. Measurement and Functional Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

81 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Assumption SLR.3 (Sample Variation in the Explanatory Variable) Estimates x Properties of The sample outcomes on i are not all the same value. OLS on any Sample of Data

Units of • This is the same as saying the sample variance of {xi : i = 1, ..., n} is not zero. Measurement and Functional Form • If in the population x does not change then we are not asking an interesting Using the Natural Logarithm in Simple Regression question.

Expected Value of OLS

82 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model Deriving the Assumption SLR.4 (Zero Conditional Mean) Ordinary Least Squares Estimates In the population, the error term has zero mean given any value of the explanatory

Properties of variable: OLS on any Sample of Data E(u|x) = 0 for all x. Units of Measurement and Functional Form Using the Natural • Key assumption. Logarithm in Simple Regression Expected • We can compute the OLS estimates whether or not this assumption holds. Value of OLS

83 / 86 The 4 Assumptions for Unbiasedness

Definition of the Simple Regression Model ˆ ˆ Deriving the Goal: We want to know if β1 is unbiased for β1, and β0 is unbiased for β0 Ordinary Least Squares Estimates Properties of • If, OLS on any Sample of Data E(βˆ1) = β1 Units of Measurement and Functional ˆ Form E(β0) = β0 Using the Natural Logarithm in Simple Regression Then, the OLS estimator is unbiased. Expected Value of OLS • Demonstration: On the white board.

84 / 86 Unbiasedness of OLS

Definition of the Simple Regression Model

Deriving the Ordinary Least Squares Estimates Theorem: Unbiasedness of OLS Properties of OLS on any Under Assumptions SLR.1 through SLR.4 Sample of Data E(βˆ ) = β E(βˆ ) = β Units of 0 0 and 1 1 , Measurement ˆ ˆ and Functional for any values of β0 and β1, i.e., β0 is unbiased for β0, and β1 is unbiased for β1 Form Using the Natural Logarithm in Simple Regression

Expected Value of OLS

85 / 86 Unbiasedness of OLS

Definition of the Simple Regression Model • Therefore, the four assumptions for the OLS estimator to be unbiased are: Deriving the Ordinary Least Squares SLR.1: (Linear in Parameters) y = β0 + β1x + u Estimates SLR.2: (Random Sampling) Properties of OLS on any SLR.3: (Sample Variation in xi) Sample of Data SLR.4: (Zero Conditional Mean) E(u|x) = 0 Units of Measurement and Functional Form • If any of these assumptions fails, the OLS estimator will (generally) be biased. Using the Natural Logarithm in Simple Regression

Expected • To be discussed in the next chapter: What are the omitted factors? Are they Value of OLS likely to be correlated with x? If so, SLR.4 fails and OLS will be biased.

86 / 86