Chapter 6: The Simple Regression Model and Introduction to

M. Angeles Carnero

Departamento de Fundamentos del Análisis Económico

Year 2014-15

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 1 / 81 Introduction

Econometrics is a branch or subdiscipline of Economics that uses and develops statistic methods in order to estimate relationships between the economic variables, to test economic theories and to evaluate government and firms policies. Examples of econometric applications: Effects on employment of a training programme for unemployed people. Counselling in different investment strategies. Effects on sales of an advertising campaign. Econometric Applications with many economic disciplines: Macroeconomics = Prediction of variables such as GNP and inflation or quantifying) the relationship between interest rate-inflation. Microeconomics = Quantify the relationship between education and wages, production) and inputs, R+D investment and firms profits. Finance = Volatility Analysis of assets, Asset Pricing Models M. Angeles Carnero (UA) ) Chapter 6: SRM Year 2014-15 2 / 81 Stages of the empirical economic analysis The first stage of the econometric analysis is to formulate clear and precisely the question to be studied (test of an economic theory, analysis of the effect of a public policy, etc. ). In many cases a formal is built. Example In order to describe the consumption decision of individuals subject to budget constraints, we assume that the individuals make their choices in order to maximise their utility level. This model implies a set of demand equations in which the demanded quantity of each good depends on its own price, the price of other substitute and complementary goods, consumer income and their individual characteristics affecting their preferences. These equations model the individual consumption decisions and are the basis for the econometric analysis of the consumers demand.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 3 / 81 Crime Economic Model (Gary Becker (1968)) This model describes the individual participation in crime and it is based on the utility maximisation. Crimes imply economic rewards and costs. The decision to participate in crime activities is a problem of assigning resources in order to maximise utility, where the costs and benefits of the alternative decisions must be taken into account. Costs: Costs linked to the possibility of being arrested and convicted. Opportunity cost of not participating in other activities such as legal jobs.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 4 / 81 Crime Economic Model (cont.) Equation describing the time invested in crime activities

y = f (x1, x2, x3, x4, x5, x6, x7)

y Hours devoted to crime activities ! x1 Hourly "Wage" of crime activities. x ! Hourly wage of legal work. 2 ! x3 Other income that does not arise from crime activities or paid work.! x4 Probability of being arrested x ! Probability of being convicted in case of being arrested. 5 ! x6 Expected sentence in case of being arrested. x ! Age. 7 ! Function f depends on the underlying utility function that is barely known. However, we can use the economic theory, and sometimes common sense, in order to predict the effect of each variable on the crime activity.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 5 / 81 Crime Economic Model (cont.) Once the economic model has been established we must transform it into the econometric model. Following the previous example, in order to construct the econometric model we should: Specify the functional form of function f . Analyse which variables can be observed, which variables can be approximated, which variables are not observed and how one should take into account many other factors affecting crime behaviour.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 6 / 81 Crime Economic Model (cont.) Consider the following particular econometric mode for the economic model of crime behaviour

crime = β0 + β1w + β2othinc + β3farr + β4fconv + β5avgsen + β6age + u crime Frequency of the crime activity w Wage! that could be obtained in a legal job. othinc! Other income. farr !Frequency of arrests due to previous infractions fconv! Frequency of sentences. avgsen! Average duration of sentences age Age.! ! u This is the error term reflecting all the unobserved factors ! affecting crime activity such as the wage of crime activities, the family environment of the individual, etc. This also captures measurement errors in those variables included in the model. β , β , .., β of the econometric model describing the 0 1 6 ! relationship between crime (crime) and those factors used in order to determine crime in the model.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 7 / 81 Once the econometric model is specified hypothesis of interest in terms of the unknown of the model can be formulated. For example, we can ask whether wage obtained in a legal job (w) does not have any effect on the crime activity. This hypothesis is equivalent to β1 = 0. Once the econometric model has been established we have to collect the date on the variables appearing there. Finally, we use appropriate statistical techniques in order to estimate the unknown parameters and test the hypothesis of interest of these parameters.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 8 / 81 The structure of the economic data Cross-Section Data They arise from surveys to families, individuals or firms in a given point of time. In many cases we can assume that this is a random sample, that is that the observations are independent and identically distributed (iid). Examples: Encuesta de Presupuestos Familiares (EPF), Encuesta de Población Activa (EPA).

Time Series We observe one or more variables along the time They are usually dependent variables Annual, quarterly, monthly or daily frequency, etc. Examples: Monthly series of price indices, Annual GNP series, Daily IBEX-35 series.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 9 / 81 Panel Data This is a time series for each member of a cross-section = repeated cross-sections. Examples:6 Encuesta Continua de Presupuestos Familiares, Survey of Income and Living Conditions (EU-SILC).

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 10 / 81 Causality and the concept of ceteris paribus in the econometrics analysis In most applications, we are interested in analysing whether one variable has a causal effect on another variable. Examples: Would an increase of the price of the good cause a decrease in its demand? If the sentences become tighter, would this have a causal effect on crime? Has education a causal effect on the productivity of workers? Does participation in a certain training programme cause an increase in the wage of those workers attending?

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 11 / 81 The fact that there is correlation between two variables does not imply that a causality relationship can be inferred. For example, the fact that we observe that those workers participating in a certain training programme have higher wages than those that did not participate, is not enough to establish a causal relationship. Inferring causality is difficult because in Economics we usually do not have experimental data. In causality, the concept ”ceteris paribus” (the rest of the relevant factors are held fixed) is very important. For example, in order to analyse the consumers demand, we are interested in quantifying the effect that a change in the price of the good has on the demanded quantity, by holding fixed the rest of the factors such as income, the price of other goods, the preferences of the consumers, etc. The econometric methods are used in order to estimate the ceteris paribus effects and therefore to infer causality between variables.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 12 / 81 Definition of the simple regression model

The simple regression model is used in order to analyse the relationship between two variables. Although the simple regression model has many limitations, it is useful to learn to estimate and interpret this model before starting with the multiple regression model. In the simple regression model, we consider that there are two random variables y and x that represent a population and we are interested in explaining y in terms of x. For example, y can be the hourly wage and x the education years.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 13 / 81 We need to establish an equation relating y and x, and the easiest model is to assume a linear relationship

y = β0 + β1x + u (1)

This equation defines the simple regression model and it is assumed that this assumption is valid for the population of interest. y dependent variable, explained variable or response variable. ! x independent variable, explanatory variable, control variable ! and regressor. u Error term or random shock that captures the effect of other ! factors affecting y. In the analysis of the simple all these factors affecting y are considered as unobserved.

β1 is the slope parameter and β0 is the intercept. β1 and β0 are unknown parameters that we want to estimate using a random sample of (x, y).

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 14 / 81 β1 reflects the change in y given an increase in a unit of x, holding fixed the rest of the factors affecting y and that are included in u. Note that the linearity assumption implies that an increase in a unit of x has the same effect on y regardless of the initial value of x. This assumption is not very realistic in some cases and we will relax this assumption later. Example 1 Let’s consider a simple regression model relating the wage of an individual with his level of education

wage = β0 + β1educ + u

If his wage, wage, is given in dollars per hour and educ are years of education, β1 reflects the change in hourly wage given an increase in one year of education, holding the rest of the factors fixed. The error term u contains all the other factors affecting wage, such as the work experience, innate ability and tenure in the current job, etc.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 15 / 81 Example 2 Assume that the soya production is determined by the model

yield = β0 + β1fertilizer + u

where yield is the soya production and fertilizer is the quantity of fertilizer. The error term u contains other factors affecting the soya production such as the quality of land, the quantity of rain, etc.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 16 / 81 Obtaining a good estimation of parameter β1 in model (1) depends on the relationship existing between the error term u and variable x. Formally, the assumption we need to impose on the relationship

between x and u in order to obtain a credible estimation of β1 is that the mean of u conditional on x is zero for any value of x

E(u x) = E(u) = 0 (2) j Recall that the mean of u conditional on x is just the mean of the distribution of u conditional on x. Note that, as long as the model has an intercept, the assumption E(u) = 0 is not very restrictive, since this is just a normalisation that is obtained by defining

β = E(y) β E(x) 0 1 The real assumption is that the mean of the distribution of u conditional on x is constant.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 17 / 81 How assumption (2) should be interpreted in the context of previous examples? Example 1 (cont.) To simplify, we assume that the error term u only represents innate ability. The assumption (2) implies that the mean level of ability does not depend on the years of education. Under this assumption, the level of mean ability of those individuals with 10 years of education is the same as those individuals with 16 years of education. However, if we assume that those individuals with higher innate ability chose to acquire higher education, the average innate ability of those individuals with 16 years of education will be higher than the average innate ability of those individuals with 10 years of education and the assumption (2) is not satisfied. Since the innate ability is unobserved it is very difficult to know whether its mean depends on the level of education or not; but this is a question that we should think about before starting with the empirical process.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 18 / 81 Example 2 (cont) To simplify, assume that in this example the error term u is only the quality of the land. In this case, if the quantity employed in different slots is random and does not depend of the quality of the land, then the assumption (2) holds: the average quality of the land does not depend on the fertilizer quantity. On the other hand, if the best land slots obtain a higher quantity of fertilizer, the mean value of u depends on the quantity of fertilizer and the assumption (2) is not true.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 19 / 81 We obtain the expression of the mean of y conditional on x under the assumption that (2). If we compute the expected value (conditional on x) in the equation (1) we have that

E(y x) = E(β + β x + u x) = β + β x + E(u x) j 0 1 j 0 1 j and under the assumption (2)

E(y x) = β + β x (3) j 0 1 This equation shows that, under the assumption (2), the population regression function, E(y x), is a linear function of x. j From equation (3) it can be deduced that :

β0 is the mean of y when x is equal to zero β1 is the change in the mean of y given an increase in one unit of x.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 20 / 81 The estimator of Ordinary Least Squares (OLS). Interpretation.

In this section we first review how to estimate the parameters β0 and β1 of the simple regression model using a random sample of the population. Later on, we will see how to interpret the results of the estimation for a given sample. Let (x , y ) : i = 1, 2, .., n be a random sample of the population. f i i g Given that this data arises from a population defined by the simple regression model, for each observation i, we can establish that

yi = β0 + β1xi + ui (4)

where ui is the error term of observation i containing all the factors affecting yi different from xi.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 21 / 81 We use the assumption (2) in order to obtain the estimators of the

parameters β0 and β1. Since E(u) = 0, using equation (1) and substituting u as a function of the observed variables, we have that

E(y β β x) = 0 (5) 0 1 On the other hand, it can be shown that

E(u x) = 0 E(xu) = 0 j ) and using equation (1) and substituting u as a function of the observed variables, we have that

E(x(y β β x)) = 0 (6) 0 1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 22 / 81 The equations (5) and (6) allow us to obtain good estimators of the

parameters β0 and β1. Replacing in equations (5) and (6) the population expectations by

sample means, the estimators of β0 and β1 are obtained as the solutions to equations b b 1 n ∑(yi β0 β1xi) = 0 (7) n i=1 b b 1 n ∑ xi(yi β0 β1xi) = 0 (8) n i=1 Note that the equations (7) and (8)b areb the sample counterparts to equations (5) and (6). The estimates obtained as the sample counterparts of population moments are denoted as estimates of the method of moments.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 23 / 81 After some algebra, we can isolate β0 and β1 in equations (7) and (8) obtaining: β = y bβ x b (9) 0 1 n (bxi x)(ybi y) ∑ S = i=1 = xy β1 n 2 (10) 2 Sx ∑ (xi x) b i=1 n 1 where Sxy = n 1 ∑ (xi x)(yi y) is the sample covariance i=1 n 2 1 2 between x and y,and Sx = n 1 ∑ (xi x) is the sample variance i=1 of x. Note that in order for the OLS estimators to be defined we need n 2 that ∑ (xi x) > 0. i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 24 / 81 The estimates defined by equations (9) and (10) are denoted as Ordinary Least Squares (OLS) estimates of the constant term and slope of the simple regression model. The OLS estimates are computed for a given particular sample,

and therefore, for a given sample, β0 and β1 are two real numbers. If the OLS estimates are computed with a different sample, then b b one would obtain different results for β0 and β1. Therefore, since β0 and β1 are a function of the sample, we can also think that of β0 b b and β1 as random variables, that is, as estimators of population b b b parameters β0 and β1. Bothb in this section and sections 4 and 5 we are going to analyse the properties of the OLS estimates for a given sample. In section

6 we study the statistical properties of the random variables β0 and β1, that is, we study the statistical properties of β0 and β1 as estimators of the population parameters. b b b b

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 25 / 81 Although we have derived the expressions for the estimates of OLS from assumption (2), this assumption is not required in order to compute the estimates. The only condition needed in order to compute the OLS estimates for a given sample is that n 2 ∑ (xi x) > 0. i=1 n 2 In fact, note that ∑ (xi x) > 0 is not an assumption since the i=1 only condition we need is that not all the xi in the sample are all equal.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 26 / 81 We see now a graphical interpretation of the estimates of OLS of the simple regression model that justifies the name of least squares. To do so, we draw that cloud of points associated to a given sample of size n and any line

y = b0 + b1x

We show that the OLS estimates defined in equations (9) and (10) are the "best" choice for those values b0 and b1 if the objective is that the line is as "close" as possible to this cloud of points for a given proximity criterion. In particular, the proximity criterion that delivers the OLS estimates is to minimise the squared sum of the vertical distances of the cloud of points to the regression line.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 27 / 81 (,)xy ii. . y . i . . . y=+ b01 bx b+ bx 01i . . . . .

xi

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 28 / 81 Graphically, we can see that the vertical distance from point (xi, yi) to the line y = b1 + b2x is given by

y b b x i 0 1 i and therefore, the objective function that should be minimised is

n 2 s(b0, b1) = ∑(yi b0 b1xi) (11) i=1

The partial derivatives are:

n ∂s(b0, b1) = 2 ∑(yi b0 b1xi) ∂b0 i=1 n ∂s(b0, b1) = 2 ∑ xi(yi b0 b1xi) ∂b1 i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 29 / 81 The estimated coefficients are obtained once the partial derivatives of the objective function are equal to zero

n ∑(yi β0 β1xi) = 0 i=1

n b b ∑ xi(yi β0 β1xi) = 0 i=1 These two equations are denotedb as bfirst order conditions of the OLS estimates and are identical to equations (7) and (8). Therefore, the estimates obtained by minimising the objective function (11) are the OLS estimates defined in equations (9) and (10).

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 30 / 81 We define fitted value for y when x = xi as

yi = β0 + β1xi

This is the predicted valueb for ybwhenb x = xi. Note that there is a fitted value for each observation in the sample. We define the residual for each observation in the sample as the difference between the observed value yi and the fitted value yi.

u = y y i i i b and there is a residual for eachb observationb in the sample. Note that the residual for each observation is the vertical distance (with its corresponding sign) from the point to the regression line

y = β0 + β1x, and therefore, the OLS criterion is to minimise the squared sum of residuals. If a point is above the regression line, the residualb b is positive and if the point is below the regression line, the residual is negative.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 31 / 81 M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 32 / 81 Why is this criteria to minimise the squared sum of the residuals used? The answer is because this is an easy criterion and delivers good estimators with good properties under certain assumptions. Note that a criterion consisting in minimising the sum of the residuals would not be appropriate since the residuals can be positive or negative. If we could consider other alternative criterion such as minimising the sum of absolute value of the residuals

n min yi b0 b1xi b ,b ∑ 0 1 i=1 j j

The problem of using this criterion is that the objective function is not differentiable and therefore it is more complicated to compute the minimum.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 33 / 81 Interpretation of the results of the regression The regression line or sample regression function is defined as

y = β0 + β1x and it is the estimated version of the population regression function. b b b E(y x) = β + β x. j 0 1 The constant term or intercept, β0, is the predicted value for y when x = 0. In many cases, it does not makeb sense to consider x = 0, and in

these cases β0 does not have interest itself. However, it is important not to forget including β0 when predicting y for any value of x. b β0 is also the estimated value for the mean of y when x = 0. b The slope, β , is measuring the variation of y when x increases in b 1 one unit . In fact,b if x changes in ∆x units, the predictedb change in y is of ∆y = β1∆x units. β1 is measuring the estimated variation in the mean of y when x increaseb b in one unit. M. Angelesb Carnero (UA) Chapter 6: SRM Year 2014-15 34 / 81 Example 1 (cont.) Given a sample with n = 526 individuals (file WAGE1 from Wooldridge) for which the hourly wage in dollars is observed, wage, and years of education, educ, the following OLS regression line has been obtained wage[ = 0.90 + 0.54 educ The estimated value 0.9 for the intercept literally means that the predicted wage for those individuals with 0 years of education is of 90 cents ( 0.9 dollars) per hour, this does not make sense. The reason why this prediction is not good for those low levels of education is because there are very few individuals with few years of education. The estimated value for the slope indicates that one more year of education implies an increase of predicted hourly wage of 54 cents (0.54 dollars). If the increase in the number of years of education is 3 years, the predicted wage would increase in 3 0.54 = 1.62 dollars.  Regarding the prediction for different values of educ, the predicted hourly wage for individuals with 10 years of education is wage[ = 0.90 + 0.54 10 = 4.5 dollars per hour.  M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 35 / 81 Fitted values and residuals. Goodness of fit.

Algebraic properties of the OLS regression 1. The sum of the residuals is zero

n ∑ ui = 0 (12) i=1

and therefore the sample mean ofb the residuals is zero. 2. The sum of the product of the observed values for x and the residuals is zero n ∑ xiui = 0 (13) i=1 and therefore, since the mean ofb the residuals is zero by property 1, the sample covariance between the observed values of x and the residuals is zero.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 36 / 81 3. The point (x, y) lies on the sample regression line. 4. The mean of the fitted values coincides with the mean of the observed values y = y 5. The sample covariance between the fitted values and the residuals is zero. b

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 37 / 81 Goodness of fit In what follows we see a measure of the capacity of the explanatory variable to explain the variability of the dependent variable. This measure reflects the quality of the fit, that is, whether the OLS regression line fits well the data. Definitions: Total Sum of Squares(TSS):

n 2 SST = ∑(yi y) i=1 Explained Sum of Squares (SSE):

n n 2 2 SSE = ∑(yi y) = ∑(yi y) i=1 since y=y i=1 Sum of Squared Residuals (SSR): b b b b n 2 SSR = ∑ ui i=1 M. Angeles Carnero (UA) Chapter 6: SRM b Year 2014-15 38 / 81 This three values that we have just seen are non negative, since they are sum of squares. SST, SSE and SSR are measures of the degree of variability of the dependent variable, of the fitted values and of the residuals, respectively, since they are the numerators of the sample variance of each of these variables. These three measures are related to each other, since it can be shown that SST = SSE + SSR Assuming that SST is not zero, which is equivalent to saying that the observations of the dependent variable are not all the same, dividing the three terms in the sum above by SSC we have:

SSE SSR 1 = + SST SST

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 39 / 81 We define the coefficient of determination of the model as SSE SSR R2 = = 1 SST SST The square-R represents the proportion of the variability of the dependent variable that is explained by the model. R2 satisfies the following condition:

0 R2 1   It is nonnegative because SSE and SST are nonnegative It is smaller or equal than 1 because SSR is nonnegative. Sometimes R2 is also expressed as a percentage, multiplying its value by 100.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 40 / 81 In order to best understand the role of the coefficient of determination, it is useful to consider the two extreme cases: The coefficient of determination is 1 if and only if SSR = 0; in this case, all the residuals must be exactly equal to 0, thus yi = yi for all the observations and therefore all the observations lie on the OLS regression line: there is a perfect fit. b The coefficient of determination is 0 if and only if SSE = 0; in this case, all the fitted values must be exactly equal to y, that is, the fitted values do not depend on the value of the independent variable, thus the OLS regression line is an horizontal line y = y. In this case, knowing the value of the independent variable does not provide any information on the dependent variable. In practice, we would always obtain intermediate values of R2. The closer R2 is to 1, the better the goodness of fit.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 41 / 81 It is important to point out that in social sciences, low R2 is often found, especially when, as we do in this course, we work with cross sections. The fact that R2 is low does not mean that the OLS estimate is not useful. The OLS estimate can still provide a good estimate of the effect of X on y even if R2 is low. Example 1 (cont.) In the regression of wage on the years of education we have

wage[ = 0.90 + 0.54 educ n = 526, R2 = 0.165

The years of education explain 16.5% of the variation of wages.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 42 / 81 Measurement units and functional form

Measurement units It is very important to take into account the measurement units when interpreting the results of a regression. The estimated value of the parameters of a regression model depends on the measurement units of the dependent variable and the explanatory variable. If we have already estimated the parameters of the model using certain units for the variables, the estimated values for these parameters can be easily obtained if we change the measurement units.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 43 / 81 If we change the measurement units of the dependent variable and we measure it in different units y = cy, substituting the estimated model, we have

  y = cβ0 + cβ1x = β0 + β1x

  where β0 = cβ0 andbβ1 = bcβ1 andb therefore,b b the new estimated coefficients are equal to the previously estimated coefficients multipliedb bybc. b b If we change the measurement units of the explanatory variable and the measure this variable with different units x = cx, x substituting in the estimated model x = c we have

β1 y = β + x = β + βx 0 c 0 1 b  β1 b b b b where β1 = c and therefore the estimated constant does not change and theb new estimated slope is equal to the previously estimatedb slope divided by c.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 44 / 81 Example 1 (cont.) In the regression of wage on the years of education with variable wage measured in dollars per hour and variable educ measured in years we obtained the following regression line:

wage[ = 0.90 + 0.54 educ n = 526, R2 = 0.165

Which values would be obtained for the constant and the slope of the regression line if wage is measured in cents per hour? Let wagec be the wage in cents. Obviously, the relationship between wage and wagec is wagec = 100 wage  so that the estimated model using wage in cents per hour is obtained by multiplying by 100 the estimated coefficients we obtained when wage is measured in dollars per hour

wagec\ = 90 + 54 educ

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 45 / 81 Example 1 (cont.) In this way, we obtained that the interpretation of the regression results does not change when the measurement units are changed, since an increase in one year of education implies an increase of 54 cents per hour in the predicted wage. Regarding R2, the intuition tells us that since this provides information on the goodness of fit it should not depend on the measurement units of the variables. In fact, it can be shown, using the definition, that R2 does not depend on the measurement units. In this example, we have that R2, when wage is measured in cents per hour, is also 0.165.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 46 / 81 Example 3 Using a sample (file CEOSAL1 from Wooldridge) of n = 209 executive directors for whom their annual wage in thousands of dollars is observed, salary, and the average return (in percentage) of the shares of their company, roe, the following OLS regression line has been obtained

salary\ = 963.19 + 18.50 roe n = 209, R2 = 0.013

From this model, we have that an increase in a percentage point in the shares returns increases the predicted wage of the executive director in 18500 dollars (18.5 thousands of dollars). If we change the measurements units of the explanatory variable, for example, if the return is expressed as a decimal instead of as a percentage, what would the new estimated coefficients be?

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 47 / 81 Example 3 (cont.) Let roe1 be the share return expressed as a decimal. Clearly, the relationship between roe and roe1 is 1 roe1 = roe 100 so that the estimated model using the shares return in decimals is obtained multiplying by 100 the estimated slope we obtained when the return is measured as a percentage

salary\ = 963.19 + 1850 roe1 n = 209, R2 = 0.013

In this way, we obtain again that the interpretation of the regression results does not change when the measurement units change, since as before an increase in a percentage point in the company shares return implies an increase in the predicted wage of the executive director of 1850 0.01 = 18.5 thousands of dollars. R2 does not change when we change the measurement units of the independent variable. In this example R2 is still equal to 0.013.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 48 / 81 Example 3 (cont.) If we change now the measurement units of both the dependent and explanatory variable, for example, we express the return with decimals and the wage in dollars, what would the new estimated coefficients be? On the one hand, we have just seen that the units change in the shares return implies that we need to multiply by 100 the estimated slope. On the other hand, if salary100 denotes the wage in hundreds of dollars salary100 = 10 salary  These units change implies that we need to multiply by 10 both the constant and the slope of the regression line. If we make both unit changes, the regression line is

salary\100 = 9631.9 + 18500 roe1 n = 209, R2 = 0.013

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 49 / 81 Functional form So far we have considered linear relationships between two variables. As seen above, when we establish a linear relationship between y and x we are assuming that the effect on y of a change in one unit of x does not depend on the initial level of x. This assumption is not very realistic in some applications. For example, in example 1 where wage is a function of the years of education, the estimated model predicts that an additional year of education would increase wage in 54 cents both for the first year of education, for the fifth, for the sixteenth, etc and this is not quite reasonable.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 50 / 81 Assume that each additional year of education implies a constant percentage increase in wage. Can this effect be taken into account in the context of the simple regression model? The answer is yes and it is enough to consider the logarithm of wage as the dependent variable of the model . Assume that the regression model relating wage and years of education is:

log(wage) = β0 + β1educ + u (14) In this model if we hold fixed all the other factors affecting wage and captured by error term u, we have that an additional year of

education implies and increase of β1 in the logarithm of wage.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 51 / 81 Therefore, since a percentage increase is approximately equal to the difference of logs multiplied by 100, we have that this model implies that, holding fixed all the factors affecting wage and captured in the error term u, an additional year of education implies and increase in wage of 100 β %.  1 Note that equation (14) implies a nonlinear relationship between wage and years of education. An additional year of education implies a higher increase in wage (in absolute terms) the higher the initial number of years of education is: The model where the dependent variable is in logarithms and the explanatory variable is in levels is denoted as log-level model.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 52 / 81 The model (14) can be estimated by OLS using the logarithm of wage as the dependent variable. Using the data in example 1, the following results have been obtained

log\(wage) = 0.584 + 0.083 educ n = 526, R2 = 0.186

Therefore, this estimated model implies that for any additional year of education the hourly wage increases by 8.3%. This effect is denoted by economist as return to an additional year of education. There is another important non linearity not included in this application. This non linearity would reflect a "certification" effect. It could be the case that year 12, that is finishing secondary education, has a much larger impact on wage that finishing year 11, since the latter does not imply the degree. In chapter 5 we will see how to take into account this type of non linearities.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 53 / 81 We analyse here how to use the logarithm transformation in order to obtain a model with constant elasticity. Example 4 Using the same data as in example 3, we can estimate a model with constant elasticity that relates the wage of executive directors with the sales of the firm. The population model we have to estimate is

log(salary) = β0 + β1 log(sales) + u where sales are the annual sales of the firm in millions of dollars and salary is the annual wage of the executive director of the firm in thousands of dollars. In this model, β1 is the elasticity of wage of executive directors with respect to the sales of the firm. This model can be estimated by OLS using the log of wage as a dependent variable and the log of sales as an explanatory variable.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 54 / 81 Example 4 (cont.) The regression model is

log\(salary) = 4.822 + 0.257 log(sales) n = 209, R2 = 0.211

The estimated elasticity is 0.257, which implies that an increase of 1% in the sales implies an increase of 0.257% in the wage of the executive director (this is the usual interpretation of elasticity).

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 55 / 81 The model where both the dependent variable and the explanatory variable are in logarithms is denoted by log-log model. We see now how a change in units of a variable that is expressed in logs affects both the constant and the slope of the model. Consider the model log-level

log(y) = β0 + β1x + u (15)

If we change the measurement units of y and define y = cy, using logarithm we have that log(y) = log(c) + log(y). Substituting in (15) we have

log(y) = β0 + log(c) + β1x + u = β0 + β1x + u and therefore, these units changes do not affect the slope, only the constant of the model. Similarly, if the explanatory variable is in logarithms and we change its measurement units, this change does not affect the slope of the model, but only the constant term.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 56 / 81 Finally, we can also consider a model where the dependent variable is in levels and the explanatory variable is in logs. This model is denoted as level-log model.

y = β0 + β1 log(x) + u

In this model, β1/100 is the variation in units of y given an increase of 1% in x.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 57 / 81 The model we studied in this chapter is denoted as simple regression model, although we have seen that this model also allows one to establish some nonlinear relationships between variables. The adjective "lineal" is due to the linearity of the model in terms

of the parameters β0 and β1. The variables y and x can be any type of transformation of other variables. We studied in detail the logarithmic transformations since they are the most interesting ones in Economics, but in the context of the simple regression model the following transformation could have also been considered 2 y = β0 + β1x + u

y = β0 + β1px + u It is important to take into account that the fact that the variables are transformation of the variables does not affect the estimation method but affects the interpretation of the parameters, for example as seen above in the logarithmic transformations. M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 58 / 81 Statistical Properties of the OLS estimators

The algebraic properties of the OLS estimates have been studied so far. In this section, we go back to the population model in order to study the statistical properties of the OLS estimators.

We consider now that β0 and β1 are random variables, that is, they are estimators of the population parameters β0 and β1 and we study some of the propertiesb ofb their distributions.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 59 / 81 Unbiasedness of the OLS estimators Unbiasedness of the OLS estimators We study under which assumptions the OLS estimators are unbiased. Assumption RLS.1 (linearity in parameters) The dependent variable y is related in the population with the explanatory variable x and the error term u through the populational model

y = β0 + β1x + u (16) Assumption RLS.2 (random sample) The data arise from a random sample of size n: (x , y ) : i = 1, 2, .., n from the population model f i i g Assumption RLS.3 (zero conditional mean) E(u x) = 0 j Assumption RLS.4 (sample variation of the independent variable) The values of xi, i = 1, 2, .., n, in the sample are not all the same.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 60 / 81 The assumptions RLS.1 and RLS.2 imply that we can write (16) in terms of the random sample as

yi = β0 + β1xi + ui, i = 1, 2, .., n (17)

where ui is the error term of observation i and it contains those unobservables affecting yi.

Note that the error term ui is not the same as the residual ui. The assumptions RLS.2 and RLS.3 imply that for each observation i b E(u x ) = 0, i = 1, 2, .., n i j i and E(u x , x , .., x ) = 0, i = 1, 2, .., n (18) i j 1 2 n Note that if assumptions RLS.4 does not hold, the OLS estimator could not be computed.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 61 / 81 Before showing the statistical properties of the OLS estimators, it

is useful to write β1 as a function of the errors of the model. Expression for β as a function of the error terms: 1b Using the definition of β in equation (10) b 1 n n b ∑ (xi x)(yi y) ∑ (xi x) yi i=1 i=1 β1 = n = n 2 2 ∑ (xi x) ∑ (xi x) b i=1 i=1 n ∑ (xi x) (β0 + β1xi + ui) i=1 = n Using (17) 2 ∑ (xi x) i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 62 / 81 Expression for β1 as a function of the error terms (cont.):

b n n n ∑ (xi x) ∑ (xi x) xi ∑ (xi x) ui i=1 i=1 i=1 β1 = β0 n + β1 n + n 2 2 2 ∑ (xi x) ∑ (xi x) ∑ (xi x) b i=1 i=1 i=1 Since n n n 2 ∑ (xi x) = 0 and ∑ (xi x) xi = ∑ (xi x) i=1 i=1 i=1 we have that n ∑ (xi x) ui i=1 β1 = β1 + n (19) 2 ∑ (xi x) b i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 63 / 81 Under assumptions RLS.1 to RLS.4, β0 and β1 are unbiased estimators of parameters β0 and β1, that it is b b E(β0) = β0 and E(β1) = β1 Proof b b We are going to show that β1 is an unbiased estimator of β1, that it is E(β1) = β1. In this proof, the expectationsb are conditional to the observed valuesb of the explanatory variable in the sample, that is, they are conditional expectations in x1, x2, .., xn. Therefore, conditioning in the observed values of x, all those terms that are a function of x1, x2, .., xn are not random. Using (19) n (x x)u ∑ i i n i=1 1 E β = β + E 0 n 1 = β + n E (x x) u 1 1 1 ∑ i i (x x)2 (x x)2 i=1 !   ∑ i ∑ i B i=1 C i=1 b Bn C 1 @ A = β1 + n ∑ (xi x) E(ui) = β1 using (18) (x x)2 i=1 ∑ i i=1 M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 64 / 81 Some Comments on Assumptions RLS.1 to RLS.4 Generally, if one of the four assumptions we consider does not hold, then the estimator is not unbiased. As mentioned before, if assumption RLS.4 fails it is not possible to obtain the OLS estimates. Assumption RLS.1 requires that the relationship between y and x is linear with an additive error; we have already discussed that we mean linear in parameters since variables x and y can be nonlinear transformation of the variables of interest. If assumption RLS.1 fails and the model is nonlinear in parameters, the estimation is more complicated and it is beyond the contents of this course. Regarding assumption RLS.2, this is suitable for many applications (although not in all of them) when we work with cross-sectional data. Finally, assumption RLS.3 is a crucial assumption for the unbiasedness of the OLS estimator. If this assumption fails, the estimators are generally biased. In chapter 3, we will see that we can determine the direction and the size of the bias.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 65 / 81 Some Comments on Assumptions RLS.1 to RLS.4 (cont.) In the analysis of simple regression with non experimental data, there is always the possibility that x is correlated with u. When u contains factors affecting y and that are correlated with x, the result of the OLS estimation can reflect the effect that those factors have on y and not the ceteris paribus relationship between x and y. Example 5 Suppose we are interested in analysing the effect of a public programme of the school lunch on the school return. It is expected that this programme has a positive ceteris paribus effect on the school return since if there is a student without economic resources to pay for the mean that benefits from this programme, his productivity in school should improve. We have data on 408 secondary school of Michigan state (file MEAP93 from Wooldridge) and for each school we observe the percentage of students that pass a standardised math exam (math10) and the percentage of students that benefit from the lunch programme in schools (lnchprg).

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 66 / 81 Some Comments on Assumptions RLS.1 to RLS.4 (cont.) Example 5 (cont.) Given this data, the following results have been obtained:

math\10 = 32.14 0.319 lnchprg n = 408 R2 = 0.171

The estimated model predicts that if the access to the programme increases in 10 percentage points, the percentage of students passing the exam decreases in approximately 3.2 percentage points. Is this result credible? The answer is NO. It is more likely that this result is due to the error term being correlated with lnchprg. The error term contains other factors (different to the access to the school lunch programme) affecting the result of the exam. Among these factors, the socioeconomic level of the students families, which affects the school productivity and that is obviously correlated with the participation in the lunch programme.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 67 / 81 Interpretation of the concept unbiasedness of an estimator Recall that the fact that an estimator is unbiased does not mean that for our particular sample the value of the estimate is close to the true value of the parameter. The fact that an estimator is unbiased implies that if we had access to many random samples of the population and for each of them the value of the estimator was computed, if the number of samples is very large, the sample mean of the estimates would be very close to the true value of the parameter we want to estimate. Since in the practice we only have access to one sample, the unbiasedness property is not very useful if there is not any other property that guarantees that the dispersion of the distribution of the OLS estimator is small. In addition, a dispersion measure of the distribution of the estimators allow us to choose the best estimator as the one with low dispersion. As a way of measuring the dispersion we use the variance, or the square root, the standard deviation.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 68 / 81 Variances of the OLS estimators In this chapter we are going to compute the variance of the OLS estimators under an additional assumption known as the homoskedasticity assumption. This assumption establishes that the variance of the error term u conditional on x is constant, that is , it does not depend on x. The variance of the OLS estimator can be computed without any additional assumption, that is, using only assumptions RLS1 to RLS4. However, the expressions for the variances in the general case are more complicated and they are beyond the scope of this course. Assumption RLS.5 (homoskedasticity)

Var(u x) = σ2 j When Var(u x) depends on x we say that the errors are j heteroskedastic.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 69 / 81 It is important to point out that assumption RLS.5 does not play

any role in the unbiasedness of β0 and β1. We add assumption RLS.5 to simplify the computation of the variance of the OLS estimators.b Additionally,b as we will see in Chapter 7, under the additional assumption of homoskedasticity the OLS estimators have some efficiency properties. Since assumption RLS.3 establishes that E(u x) = 0 and since j Var(u x) = E(u2 x) (E(u x))2 , we can write assumption RLS.5 j j j as E(u2 x) = σ2 j Assumption RLS.5 can be written as

Var(y x) = σ2 j

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 70 / 81 Example 1 Let’s consider again the simple regression model relating the wage of a person with his/her level of education

wage = β0 + β1educ + u

In this model the assumption of homoskedasticity is Var(wage educ) = σ2, i.e., the variance of wage does not depend on the numberj of years of education. This assumption cannot be very realistic since it is likely that those individuals with higher levels of education have different opportunities to work, which can lead to a higher variability of wages for high education levels. On the contrary, those individuals with low levels of education have less opportunities to work and many of them work for the minimum wage and this implies that the variability of wage is small for low levels of education.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 71 / 81 Variance of the sampling distribution of the OLS estimators Under assumptions RLS.1 to RLS.5

σ2 σ2 ( ) = = Var β1 n 2 2 (n 1)Sx ∑ (xi x) b i=1

n 2 1 2 σ n ∑ xi σ2x2 ( ) = i=1 = Var β0 n 2 2 (n 1)Sx ∑ (xi x) b i=1 where the variance is conditional to the observed values in the sample for the explanatory variables, i.e. they are conditional variances on x1, x2, .., xn

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 72 / 81 Proof

We show the formula for the variance of Var(β1). Recall the expression of β1 as a function of the errors of the model in equation (19) b b n ∑ (xi x) ui i=1 β1 = β1 + n 2 ∑ (xi x) b i=1

The variance we have to compute is conditional on xi, therefore, n 2 (xi x) , i = 1, 2, .., n, and ∑ (xi x) are not random and β1 is not i=1 random neither.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 73 / 81 Proof (cont.)

Additionally , using assumption RLS.2, errors ui are independent and therefore, using the following properties of the variance: The variance of the sum of independent random variables is the sum of the variances The variance of a constant times a equals the squared constant times the variance of the random variable The variance of the sum of a variable and a constant is the variance of the random variable we have that

n n (x x)2var(u ) (x x)2σ2 ∑ i i ∑ i i=1 i=1 Var β = 2 = 2 1 n using RLS.5 n (x x)2 (x x)2   ∑ i ∑ i i=1 ! i=1 ! b n σ2 (x x)2 ∑ i i=1 σ2 σ2 = 2 = n = 2 n (n 1)Sx 2 2 (x x) (xi x) ∑ i ∑ i=1 ! i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 74 / 81 According to the expression that we have obtained to the variance

of β1 we have that: The higher the variance of the error term, σ2, the higher the b variance of β1, if the variance of the unobservables affecting y is very large, it is very difficult to estimate β1 precisely. b The higher the variance of xi the smaller the variance of β1, if xi has a low dispersion, it is very difficult to estimate β1 precisely. b The higher the sample size, the smaller the variance of β1 is.

b

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 75 / 81 Estimation of the variance of the error term

The variance of β0 and β1 depends on the sample values of xi, which are observables and of the variance of the error term σ2, that is an unknownb parameter.b Therefore, in order to estimate the 2 variance of β0 and β1 we have to obtain an estimator of σ . Since σ2 is the variance of the error term u, that as we saw above equals the expectationb b of u2 (given that the mean of u is zero by assumption RLS.3), we could think of using the sample mean of the squared errors n 1 2 w = ∑ ui n i=1 as an estimator of σ2. If we could compute w as a function of the sample, w would be an unbiased estimator of σ2 since n n 1 2 1 2 2 E n ∑ ui = n ∑ E(ui ) = σ i=1 ! i=1

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 76 / 81 The problem is that w is not an estimator since it cannot be computed as a function of the sample since the errors are not observable What we can compute as a function of the sample is the residuals ui. In what follows, we see that the residuals are estimates of the errors and how to obtain an unbiased estimator of σ2 as a function ofb the squared residuals. Recall that the residual of observation i is defined as u = y y = y β β x i i i i 0 1 i and since the error of observation i is b b b b u = y β β x i i 0 1 i we can think of the residuals as estimates of the errors. In this way, we can define the following estimator of σ2 n 1 2 w = ∑ ui n i=1 b b M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 77 / 81 w is an estimator of σ2 but it is not unbiased. The reason why this estimator is not unbiased is that, as opposed to the errors - which areb independent-, the residuals are not independent since they satisfy the two linear restrictions seen in Section 4 (equations (12) and (13)). Therefore, since n residuals satisfy two linear restrictions, the residuals have n 2 degrees of freedom and the unbiased estimator of σ2 is 1 n σ2 = u2 n 2 ∑ i i=1 (proof in page 62 of Wooldridge)b b 2 Using this estimator for σ , the estimated variances of β1 and β0 are defined as follows b b σ2 σ2x2 Var\(β ) = and Var\(β ) = 1 (n 1)S2 0 (n 1)S2 x x b b b b

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 78 / 81 The Standard Error of Regression (SER) is defined as

σ = σ2 p σ is an estimator of the standardb deviationb of the error term, σ. Although σ is not an unbiased estimator of σ we see below that thisb has other good properties when the sample is large. b The standard error of β1, denoted by se(β1), is defined as σ bse(β ) = b 1 2 (n 1)Sx b b p se(β1) is an estimator of the standard deviation of β1 and therefore a measure of the precision of β1. b b b

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 79 / 81 Analogously, the standard error of β0, denoted by se(β0), is defined as σb x2 b se(β0) = (np 1)S2 x b b se(β0) is an estimator of the standardp deviation of β0 and therefore a measure of the dispersion of β0. b b se(β1) is a random variable since, given the values of xi, it takes different values for different samplesb of y. For a given sample, the b standard error se(β1) is a number as β1 when we compute it with a particular sample. The same happens with se(β0). The standard errorsb play a very importantb role for inference, that is, when testing restrictions on the parameters ofb the model or when computing confidence intervals.

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 80 / 81 Example 1 (cont.) Using the data of example 1, the following model has been estimated

log(wage) = β0 + β1educ + u

and the standard errors have been computed. The results of the estimation including the standard errors are usually presented as follows

log\(wage) = 0.584 + 0.0827 educ (0.097) (0.0076) n = 526, R2 = 0.186

M. Angeles Carnero (UA) Chapter 6: SRM Year 2014-15 81 / 81