<<

ECON 594: Lecture #6

Thomas Lemieux Vancouver School of , UBC

May 2019

1 Tobit model

Many economic outcome variables, for example consumption or hours of work, are always larger or equal to zero. When zeros are infrequent events, as in the case of weekly food consumption, the fact that the variable is bounded below by zero is of little econometric consequence. In the case of hours of work, however, a sizable fraction of the population is at zero and this can result in important estimation biases. Variables that are larger or equal to zero are called censored variables (could also be censored at a point other than zero). We will also discuss the case of truncated variables that are either positive or not observed (e.g. a rate) below. The standard used to deal with censored dependent variable is the tobit model. Latent variables provide once again a convenient way of analyzing the problem. Consider the latent variable

y = x + e (1)

where e is normally distributed (e N(0; 2)). In economic terms, we can think of  y as the desired level of hours of work (or consumption). Notice that e is no longer normalized to have a variance of 1 (as in the ) since we do observe continuous values of y above 0 and can, thus, estimate . Since the actual of y cannot be negative, we have

1 y = y if y > 0 (2)

y = 0 if y 6 0 (3)

Before discussing consistent estimation of this model, let’ssee why OLS estimates of the model are inconsistent for . Taking the conditional expectation of y yields:

E(y x) = Pr(y 0) 0 + Pr(y > 0) E(x + e x; y > 0) j 6   j

E(y x) = Pr(y > 0) [x + E(e x; y > 0)] = x (4) ) j  j 6 We can see in the equation that there are two reasons why E(y x) = x . First, the j 6 term Pr(y > 0) multiplies the rest of the equation, where

Pr(y > 0) = Pr(e > x ) = Pr(e= > x =) = 1 ( x =) = (x =) (5)

Second, even if E(e x) = 0 (the usual assumption), it does not follow that E(e x; y > j j 0) = 0. Since y = x + e, it follows that

E(e x; y > 0)] = E(e x; e > x ) (6) j j In fact, we can exploit the normality of e to show that:

(x =) E(e x; e > x ) =  > 0 (7) j (x =)

(x =) where (x =) is called the inverse Mills ratio and is often written in terms of the “famous” term where

(x =) (x =) = (8) (x =) Substituting these terms back into the conditional expectation (equation 4), we get:

E(y x) = (x =) [x + (x =)] = x (9) j  6 Note that taking the conditional expectation over the positive values of y only (i.e. treating y as a truncated instead of a censored variable) does not help either since:

2 E(y x; y > 0) = E[x + e x; y > 0] = x + (x =) = x (10) j j 6 Either running an OLS regression with the zeros in (equation 9) or the zeros out (equation 10) thus yields biased estimates of .

1.1 Estimation: Maximum likelihood

The standard solution to the problem of a censored dependent variable is to estimate the tobit model by maximum likelihood. First note that the density of the latent variable y is (1=)[(y x )=] . The density of the observed variable y is:

1[y=0] 1[y>0] f(y xi) = 1 (xi =) (1=)[(y x )= (11) j f g f g where 1[z] is the indicator function (1[z] = 1 if z is true, 1[z] = 0 otherwise). The log-likelihood for each observation i is

li( ; ) = 1[yi = 0] log[1 (xi =)] + 1[yi > 0] log[(1=)[(y x )=]] (12) As in the case of the probit, the log- for the tobit model is the sum of the log-likelihoods for each observation. Consistent estimates of and  are obtained by numerically maximizing the log-likelihood function.

1.2 Estimation: Heckman two-step method

An alternative to maximum likelihood estimation of the tobit model is Heckman’stwo- step, or , correction method. Some people also call the procedure “heckit”. The basic idea of the method is that the  term in equation (10) can be estimated using a probit model in the …rst step. To see this, note that when  is not equal to one, the underlying probabilities in the probit model become:

Pr(y > 0) = (x =) = (x ) (13)

where = =: This means that we can get a consistent estimate of from the

b

3 probit model, and plug this in to get an estimate i of the  term for observation i:

(xi ) b i = : (14) (xi ) b b In the second step, i is included along withb the other explanatory variables, x, in a regression based on equation (10). It can be shown that estimating this “control b function” model with OLS yields consistent estimates of both and  (the regression coe¢ cient in front of the  term). The procedure can thus be described as follows Step 1: Use a probit to estimate whether or not y is larger than zero. Use the probit estimates to compute i:

Step 2: For positive observations of y, run a regression of yi on xi and i. b Note thatb unlike maximum likelihood, the heckit estimates are not e¢ cient. Fur- b thermore, standard errors have to be corrected for the fact that 1) i is estimated (the generated regressor problem) and that 2) adding i as a regressor results in heteroskedas- b tic errors even if the original error term e is normal and homoskedastic. The second b problem can be …xed by just using robust standard errors, while the …rst problem can either be solved by computing appropriate formulas (this is what Stata does) or by boot- strapping the whole two-step procedure.

1.3 Truncated variables

When y is a truncated variable, the appropriate model is:

y = y if y > 0

y is not observed if y 6 0 This does not really a¤ect the likelihood function as we can now simply write

li( ; ) = 1[yi is missing] log[1 (xi =)] + 1[yi > 0] log[(1=)[(y x )=]] (15) We can also run the exact same heckit procedure as before. The only di¤erence is that instead of doing a probit for whether y is zero or positive, we now do a probit for whether y is missing or positive.

4 2 Generalized tobits and self-selection models

Heckman’sprocedure was extremely popular for a while because it can be generalized to many interesting cases beside the tobit where we have a self-selection problem. Take, for instance, the case of a wage equation where we want to estimate the e¤ect of education (x) on log (y). The problem is that people who don’twork tend to do so because they have low wages, or more generally because their wage is lower than their reservation wage. This is not, strictly speaking, a tobit model, since factors other than the wage itself determine whether or not the person works (and whether the wage will be observed in the data). This generalized tobit or self-selection model can be written as follows:

y = y if y1 > 0

y is not observed if y1 6 0 where, as before:

y = x + e;

which is the latent or “true” wage equation, but where the participation decision is driven by another latent variable

y1 = z + u; (16) where z is the set of variables that determines the participation decision (gener- ally includes x and may include other variables), while u is another normal error term (normalized to have a variance of one) that is presumed to be correlated with e, i.e.

eu = cov(e; u) = 0. It can be shown in this case that: 6

E(y x; y > 0) = x + eu(z) = x (17) j 1 6 So the problem is that we have an omitted variable (z) that will be correlated with x, since x is part of z and enters non-linearly in the function (z). In principle, this can be solved once again by applying the  correction (heckit) technique. The approach somehow fell out of favour because the way the model is identi…ed is not very compelling unless we have strong reasons to believe that some variables in z do not belong directly in the model, i.e. in the set of x variables. When z = x, the only di¤erence between x and (z) = (x) is that the latter is a non-linear function of x. Unfortunately, when x is also allowed to be replaced by a more ‡exible function (e.g.

5 polynomials in x), it is no longer possible to separately identify the e¤ect of these two terms because of multicollinearity. Note also that this approach is often called a “control function” approach, where the function (z) = x implicitly controls for the selection/endogeneity problem. The 6 regression version of the Hausman test where the …rst stage residual (a function of the instrumental variables) is included in the main regression is also often called a control function regression, where the control function is this time the …rst stage residual. In the context of TSLS, everything is linear and the model is solely identi…ed because of the exclusion restrictions (instrumental variables a¤ect the …rst stage residual but do not directly enter the second stage regression). Most applied researchers tend to think that the same criteria should also apply to Heckman’s model, i.e. that we should only use this approach when we have good reasons to believe that we have some instruments in z that a¤ect the probability of observing y without a¤ecting directly y through x .

3 Stata commands

As in the previous lecture notes, consider the case of three explanatory variables x1, x2, and x3. In the standard tobit model where y is censored at zero, the Stata command for the tobit is: tobit y x1 x2 x3, ll(0) where the option ll(0) indicates that y is left-censored at 0 (other points can also be chosen, and right censoring is captured by the option ul(.) instead). In the case where there is no exclusion restriction, the command for Heckman’smodel would either be heckman ym x1 x2 x3, select(x1 x2 x3) twostep or heckman y x1 x2 x3, select(yobs=x1 x2 x3) twostep where ym is a modi…ed version of y where missing values have been set to observations for which y=0, while yobs is a dummy indicating whether the observation for y is positive. The Heckman procedure presumes that the dependent variable is missing (truncated) in somes cases in the …rst version of the model. This is why we include a variable ym instead of y from the tobit model (where y is sometimes zero). The variable ym can be

6 created from the variable y used in the tobit (censored) model by writing:

gen ym=y if y>0

replace ym=. if y==0

The other way of telling Stata for which observations the dependent variable is truncated is to use a dummy variable like yobs in the “select”option. If we start again with the censored variable y from the tobit, yobs can be created as follows:

gen yobs=1 if y>0

replace yobs=0 if y==0

A simpler command for creating the dummy is:

gen yobs=(y>0)

Note also that if the option “twostep” is not speci…ed, the model will be estimated by maximum likelihood instead (generalized maximum likelihood tobit). If we have a variable x4 that helps predict whether y is observed without a¤ecting y directly (e.g. an instrument), we can simply run

heckman ym x1 x2 x3, select(x1 x2 x3 x4) twostep or heckman y x1 x2 x3, select(yobs=x1 x2 x3 x4) twostep

7