<<

Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Ordinary : the univariate case

Clément de Chaisemartin

Majeure Economie

September 2011

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

1 Introduction

2 The OLS method Objective and principles of OLS Deriving the OLS estimates Do OLS keep their promises ?

3 The linear causal model Assumptions Identification and estimation Limits

4 A simulation & applications OLS do not always yield good estimates... But things can be improved... Empirical applications

5 Conclusion and exercises

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Objectives

Objective 1 : to make the best possible guess on a variable Y based on X . Find a function of X which yields good predictions for Y . Given cigarette prices, what will be cigarettes sales in September 2010 in France ? Objective 2 : to determine the causal mechanism by which X influences Y . Cetebus paribus type of analysis. Everything else being equal, how a change in X affects Y ? By how much one more year of education increases an individual’s wage ? By how much the hiring of 1 000 more policemen would decrease the crime rate in Paris ? The tool we use = a set, in which we have the wages and number of years of education of N individuals.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Objective and principles of OLS What we have and what we want

For each individual in our data set we observe his wage and his number of years of education. Assume we have a graph such as the one below. Relationship between the two variable seems to be linear. We want to find the line which describes best the relationship between these variables.

4000

3500

3000

2500 Wage 2000

1500

1000

500

0 8 10 12 14 16 18 20 Years of Schooling

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Objective and principles of OLS The principle of OLS

A line is characterized by a slope and by an intercept that we denote αb and βb. Idea = choose for αb and βb the values which minimize P 2 (Yi − αb − βb × Xi ) . Ordinary Least Squares Estimates. Let us denote Ybi = αb + βb × Xi . It represents the wage of individual i as predicted by our model. We also denote εbi = Yi − Ybi . The εbi are called the estimated residuals and represent the mistake made by our model when predicting individual i’s wage based on his number of years of schooling. => the principle of OLS is merely to minimize the sum of the mistakes we make when we use an affine function of Xi to predict Yi . Why do we take the square of εbi ? Could we have used another function ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Objective and principles of OLS A graphical example

4000

3500

3000

2500 Wage 2000

1500

1000

500

0 8 10 12 14 16 18 20 Years of Schooling

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Deriving the OLS estimates Finding αb and βb (Theorem 1.1)

1 P We denote Y = N Yi the empirical of (Yi ), X the 1 P 2 1 P 2 empirical mean of (Xi ), Ve (X ) = N Xi − N Xi the empirical of (Xi ) and finally 1 P cove (X , Y ) = N Xi Yi − X Y the empirical of (Xi ) and (Yi ). P 2 We want to minimize f (α,b βb) = (Yi − αb − βb × Xi ) . Solution: β = cove (X ,Y ) and α = Y − cove (X ,Y ) × X . b Ve (X ) b Ve (X ) Can we compute βb from the ? Any problem with the computations ? Any idea to interpret this result ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Deriving the OLS estimates An example

Compute βb in this simple example:

Individual Years of Schooling Wage

1 5 1000 2 5 1500 3 10 1000 4 15 2000 5 15 2500

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? Do OLS attain objectives 1 and 2 ?

Objective 1: find the best prediction for Y based on X / find a function P(Xi ) of Xi which yields good predictions for Yi . Objective 2: determine the causal mechanism by which X influences Y .

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? OLS partially reach objective 1.

Once agreed that a good prediction is a prediction which minimizes the square of errors, OLS yield by construction the best prediction function for Y , among all affine functions of X . But: P the criterion can be challenged: minimize |εbi | instead of P 2 εi . This is not so big an issue. models b P minimize |εbi | and results usually close from OLS. even if the criterion is accepted, OLS yield the best prediction function among all affine functions of X , not among all functions of X . There might for instance exist a polynomial 0 0 0 0 2 function of X : α + βb X + γ X which yields errors εi such 2 b b b P  0  P 2 that εbi < εbi . Not so big an issue neither, see next chapter. How to measure the extent to which Objective 1 is reached ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? The R2: a measure of the quality of our predictions

P 2 SST = (Yi − Y¯ ) : the dispersion of wages. P ¯ 2 SSE = (Ybi − Y ) : the dispersion of predicted wages. P 2 SSR = (Yi − Ybi ) : the sum of the square of the errors. P ¯ 2 P ¯ 2 SST = (Yi − Y ) = (Yi − Ybi + Ybi − Y ) = P 2 P ¯ 2 P ¯ (Yi − Ybi ) + (Ybi − Y ) + 2 εi (Ybi − Y ) = P P b P SSE + SSR + 2αb εbi + 2βb εbi Xi − 2Y εbi . P According to FOC1, εi = 0, according to FOC2, P b εbi xi = 0. Therefore, SST = SSE + SSR. 2 SSE 2 R = SST . The R is always included between 0 and 1 (why ?). It is a measure of the share of the variance observed in the sample our model is able to account for, of the quality of our predictions for Y based on X . However, a model with a low R-square can still be helpful and models with high R-squared can be helpless.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? But OLS do not necessarily reach objective 2.

4000

3500

3000

2500 Wage 2000

1500

1000

500

0 8 10 12 14 16 18 20 Years of Schooling

Individuals with more schooling have higher wages. Does it imply that schooling has a causal impact on wages ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? But OLS do not necessarily reach objective 2.

The line can be inverted ⇒ causality goes in the other direction. Reverse causality. Here, not an issue: higher wages cannot cause longer education because schooling takes place before labor market participation. Individuals with many years of schooling make more money than those with few years of schooling. But do those two groups only differ on their number of years of schooling ? Probably not. For instance, those with more years of schooling might have richer parents, or might also be more clever. ⇒ this correlation between wages and education, is it only due to the effect of education on wages, or to the fact that those with more education are also more clever and have richer parents ? Omitted variable bias. Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Do OLS keep their promises ? A causal framework

Parents’ wage

Well paid parents can afford Well paid parents have good sending their children to networking skills, know how school, then to college and to get good positions => can finally to university help their children

Education increases children’s productivity + Children’s education ability to find a well paid job Children’s wage (signalling theory)

True causal impact of education on wages = green cell. If this framework is true, does β,b i.e. the correlation between children’s education and wage measures the green cell only ? Does it overestimate or underestimate the green cell ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Assumptions Positing a linear causal model

We assume that for every individual, his income is generated according to the following model: Income = α + β × Number of Years of Education + ε

More formally: Yi = α + β × Xi + εi . Yi is the dependent variable, Xi the explanatory variable, and εi the error term: all other determinants of income (cleverness, gender...). Assumption 1. β measures by how much wage changes when education of an individual increases by one year and all the other determinants of income (ε) remain unchanged (cetebus paribus impact of education), i.e. the causal impact of education on income. Assuming that education has an influence on income does not seem to be too big an assumption. However, we assume that this influence is linear, when the number of years of education is increased by 1, wage increases by β. Realistic ? Moreover, we assume that this influence is the same for everyone: β does not depend on i. Realistic ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Assumptions Why is linearity not so stupid an assumption...

If the relationship between the data does not look linear at all, you can try to estimate a different equation: 2 Yi = α + βXi + εi for instance if the relationship is quadratic. If the data looks as in the graph below, which relationship do you want to estimate ?

3500

3000

2500

2000

1500

1000

500

0 0 2 4 6 8 10 12 14 16 18 20

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Assumptions Other assumptions

Assumption 2 : random . (Xi , εi ) is independent from (Xj , εj ). This amounts to say that the number of years of education completed by Mr Dupont, or his marital status, is not related to Mr Duchamp’s who lives fifty kilometers from him and whom he does not know. This seems fairly credible.

Assumption 3 : sample variation. In our sample, not all the Xi are equal. Trivial assumption: if it is not verified, that is to say if all the individuals in our sample have the same number of years of education, it is impossible to determine the impact of education on wage from our data. This implies that Ve (Xi ) > 0. Assumption 4: εi ⊥Xi Question: in our example of wage and education, do you believe that εi ⊥Xi ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation What is identification ?

Identification amounts to finding a formula relating an unknown parameter (here this unknown parameter will be β, the causal impact of education on wages) to quantities that we can estimate from the data.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation Identification of the

Theorem: under assumption 1 to 4, β is identified. Proof: cov(Yi , Xi ) = cov(α + β × Xi + εi , Xi ) according to assumption 1 = cov(α, Xi ) + βcov(Xi , Xi ) + cov(εi , Xi ) according to the properties of covariance = βV (Xi ) since cov(α, Xi ) = 0 and cov(εi , Xi ) = 0 according to assumption 5. Therefore, β = cov(Yi ,Xi ) . V (Xi )

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation How to estimate β ?

As shown above, β = cov(Yi ,Xi ) . V (Xi ) Any idea on a good estimator βb ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation Consistency of β

β = cove (Yi ,Xi ) . b Ve (Xi ) : cove (Yi , Xi ) → cov(Yi , Xi ) and Ve (Xi ) → V (Xi ). Therefore: β → β = cov(Yi ,Xi ) when the number of b V (Xi ) observations in the sample goes to infinity.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation Asymptotic normality of β

The OLS√ estimators are asymptotically normal, in the sense σ2 that N(βb − β) ,→ N(0, V (X ) ) () The meaning of this is that√ when the size of the sample is large, we can state that N(βb − β) is approximately normally distributed. Proof at page 177 of your text book. This result is important to build up confidence intervals for β.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation Variance of β

2 Let us denote σ = V (εi ). σ2 The variance of βb is equal to P 2 (you can find a proof (Xi −X ) at page 55 of the textbook): It is increasing with σ2. The more the error term is spread, the harder it is to estimate precisely β. For instance, assume that unobserved determinants of wage (ambition, ability, age...) play an important role in wage setting. For some individuals, εi will take very high positive values, and for others it will take very low negative values. We will therefore be likely to be faced to individuals with low levels of education and high wages and conversely, which will make the estimation of β difficult. The more Xi is volatile in our sample, the more precisely we estimate β. P 2 Finally, (Xi − X ) is increasing with N, the number of people in our sample.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Identification and estimation Estimating σ2

In next session we will need to use an estimator of the variance of the error term. Usually, to estimate for instance a “theoretical” mean, we use the “empirical” one. Here, we use the same idea: to estimate the variance of the error term, a natural idea would be to use the empirical 1 P 2 variance of the estimated residuals: N εbi . This estimator indeed converges to σ2 (LLN). However it is biased: one can 1 P 2 N−2 2 show that E( N εbi ) = N σ . Thus, we prefer to use the following unbiased estimator 2 1 P 2 σc = N−2 εbi . It is easy to show that this estimator also converges to σ2.

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Limits Link with OLS

In the linear model, β represents the causal impact of X on Y . Under various (very strong) assumptions, one can show that β = cov(Yi ,Xi ) , which can be estimated from the sample by the V (Xi ) quantity β = cove (X ,Y ) . b Ve (X ) As you may have noticed, this estimator βb is the same as the quantity we derived in section 2 with the OLS method. => if the linear model assumptions are verified, then predictions based on OLS are not only the best predictions for Y based on X , but βb also describes the causal impact of X on Y . But are the linear model assumptions credible ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Limits Review of the assumptions of the linear model

Assumption 1: fairly credible up to the linear approximation (impact of education on wage might not be linear) and to the constant effect assumption Assumption 2 and 3: credible. Assumption 4: extremely strong assumption. Amounts to stating that X is not correlated to all other determinants of Y . Credible in the wage / education example ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Limits What happens if assumption 4 is not verified ?

Theorem: If assumption 5 is not verified, then the OLS estimator βb is not a of β, the causal impact of X on Y . Proof: cov(Yi , Xi ) = cov(α + β × Xi + εi , Xi ) = βcov(Xi , Xi ) + cov(εi , Xi ) Therefore, β = cov(Yi ,Xi ) − cov(εi ,Xi ) . V (Xi ) V (Xi ) Since β → cov(Yi ,Xi ) , β is not consistent. b V (Xi ) b The asymptotic bias, that is to say the difference between the limit of β and β is equal to cov(εi ,Xi ) : the stronger the b V (Xi ) correlation between ε and X , the larger the bias. If X and ε are positively (resp. negatively) related, βb overestimates (resp. underestimates) β. In the wage / education example, do you think βb over or underestimates β ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

OLS do not always yield good estimates... Generating 18 random pairs for wage and education (1/2)

Open an Excel file, write in cell A1 to A18 “= 2000 ∗ (alea() − 0, 5)” if you have the French version of Excel. The 18 random numbers you have generated thus stand for the ε in our model. They are supposed to be independent. Do they verify the other assumptions we made on the ε ? What kind of distribution do they follow ? What is their expectation and their variance ? Then, write from cell B1 to B18 “= ent(10 + alea() ∗ 10)”. These 18 random numbers stand for the number of schooling years. Do they verify the assumptions we imposed on the Xi ? Finally, write in cell C1 “= 1500 + 100 ∗ B1 + A1”, and extend this formula until C18. What do these 18 numbers stand for ? Do the Xi truly have a causal impact on the Yi here ? In this , what are the true values of α and β ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

OLS do not always yield good estimates... Generating 18 random pairs for wage and education (2/2)

Select cell B1 to C18, go to the “assistant graphique” and make a graph, choosing the option “nuage de points”. Once this is done, select your graph and go to the graphic menu, select the “Ajouter une courbe de tendance” option. Choose the linear type of curve and go to options. Select “Afficher l’équation sur le graphique” and “Afficher le coefficient de détermination sur le graphique”. Once this is done, write down on a sheet of paper the values for βb that appears on the graphic. Is it close to the trueβ ? Any idea of why it is the case ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

OLS do not always yield good estimates... What I get...

4000

3500 y = 32,612x + 2636,6 R2 = 0,0369 3000

2500 Wage 2000

1500

1000

500

0 8 10 12 14 16 18 20 Years of Schooling

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

But things can be improved... Illustrating some points of the course

In the first column, write “= 200 ∗ (alea() − 0, 5)” instead of “= 2000 ∗ (alea() − 0, 5)”. Is your new estimate βb closer from the true β ? What is your intuition to explain this result ? Now write “= 4000 ∗ (alea() − 0, 5)” in cell A1 and extend the formulas in cells A1, B1 and C1 up to A200, B200 and C200. Draw a new graph similar to the previous one but selecting cells from B1 to C200. Is your new estimate βb closer from the true β ? What is your intuition to explain this result ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

But things can be improved... What I get...

4000

3500 y = 101,08x + 1490,8 R2 = 0,9661 3000

2500 Wage 2000

1500

1000

500

0 8 10 12 14 16 18 20 Years of Schooling

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

But things can be improved... What I get...

6000

5000

4000 y = 97,233x + 1560,2 Wage 2 R = 0,0507 3000

2000

1000

0 8 10 12 14 16 18 20 Years of Schooling

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Empirical applications Consequences of smoking when pregnant

In a sample of 1 388 American mothers who gave birth to a child in 1988, we estimate the following relationship: weight of the child in grams = α + β × daily cigarettes smoked by mother during pregnancy + ε. Results: αb = 3395, βb = −14, 57. How to interpret βb ? Are the various assumptions needed for OLS to be unbiased etc. verified here according to you ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Empirical applications Consequences of attending a class on exam grade

Assume we want to estimate the following model among students attending an econometric course: final grade = α + β × number of classes attended + ε. Do you think that the estimated value βb would estimate properly the true causal impact of attendance on final grade ?

Clément de Chaisemartin Ordinary Least Squares Introduction The OLS method The linear causal model A simulation & applications Conclusion and exercises

Conclusion

Today, we have seen the OLS technique to make a prediction for Y based on X . We have seen that up to two “small” limits, this prediction is the best we can make => our first goal was reached. However, we have seen that OLS estimators also describe the causal impact of X on Y iif a very restrictive assumption is made, which is that X is uncorrelated to all other determinants of Y . But in many situations, unlikely to hold => in most cases we will not be able to achieve our second goal with OLS. Finally, we have seen with some simulations that even in situations where all OLS assumptions are verified (which we can be sure of because we used data generated by the computer), OLS estimators can be far from the true values when the sample size is small. => do not do statistics with small samples ! ReferencesClément for this de Chaisemartin chapter: chapterOrdinary 2Least and Squares 5 of your textbook.