ECON 594: Lecture #6

ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2019 1 Tobit model Many economic outcome variables, for example consumption or hours of work, are always larger or equal to zero. When zeros are infrequent events, as in the case of weekly food consumption, the fact that the variable is bounded below by zero is of little econometric consequence. In the case of hours of market work, however, a sizable fraction of the population is at zero and this can result in important estimation biases. Variables that are larger or equal to zero are called censored variables (could also be censored at a point other than zero). We will also discuss the case of truncated variables that are either positive or not observed (e.g. a wage rate) below. The standard econometric model used to deal with censored dependent variable is the tobit model. Latent variables provide once again a convenient way of analyzing the problem. Consider the latent variable y = x + e (1) where e is normally distributed (e N(0; 2)). In economic terms, we can think of y as the desired level of hours of work (or consumption). Notice that e is no longer normalized to have a variance of 1 (as in the probit model) since we do observe continuous values of y above 0 and can, thus, estimate . Since the actual value of y cannot be negative, we have 1 y = y if y > 0 (2) y = 0 if y 6 0 (3) Before discussing consistent estimation of this model, let’ssee why OLS estimates of the model are inconsistent for . Taking the conditional expectation of y yields: E(y x) = Pr(y 0) 0 + Pr(y > 0) E(x + e x; y > 0) j 6 j E(y x) = Pr(y > 0) [x + E(e x; y > 0)] = x (4) ) j j 6 We can see in the equation that there are two reasons why E(y x) = x . First, the j 6 term Pr(y > 0) multiplies the rest of the equation, where Pr(y > 0) = Pr(e > x ) = Pr(e= > x =) = 1 ( x =) = (x =) (5) Second, even if E(e x) = 0 (the usual assumption), it does not follow that E(e x; y > j j 0) = 0. Since y = x + e, it follows that E(e x; y > 0)] = E(e x; e > x ) (6) j j In fact, we can exploit the normality of e to show that: (x =) E(e x; e > x ) = > 0 (7) j (x =) (x =) where (x =) is called the inverse Mills ratio and is often written in terms of the “famous” term where (x =) (x =) = (8) (x =) Substituting these terms back into the conditional expectation (equation 4), we get: E(y x) = (x =) [x + (x =)] = x (9) j 6 Note that taking the conditional expectation over the positive values of y only (i.e. treating y as a truncated instead of a censored variable) does not help either since: 2 E(y x; y > 0) = E[x + e x; y > 0] = x + (x =) = x (10) j j 6 Either running an OLS regression with the zeros in (equation 9) or the zeros out (equation 10) thus yields biased estimates of . 1.1 Estimation: Maximum likelihood The standard solution to the problem of a censored dependent variable is to estimate the tobit model by maximum likelihood. First note that the density of the latent variable y is (1=)[(y x )=] . The density of the observed variable y is: 1[y=0] 1[y>0] f(y xi) = 1 (xi =) (1=)[(y x )= (11) j f g f g where 1[z] is the indicator function (1[z] = 1 if z is true, 1[z] = 0 otherwise). The log-likelihood for each observation i is li( ; ) = 1[yi = 0] log[1 (xi =)] + 1[yi > 0] log[(1=)[(y x )=]] (12) As in the case of the probit, the log-likelihood function for the tobit model is the sum of the log-likelihoods for each observation. Consistent estimates of and are obtained by numerically maximizing the log-likelihood function. 1.2 Estimation: Heckman two-step method An alternative to maximum likelihood estimation of the tobit model is Heckman’stwo- step, or , correction method. Some people also call the procedure “heckit”. The basic idea of the method is that the term in equation (10) can be estimated using a probit model in the …rst step. To see this, note that when is not equal to one, the underlying probabilities in the probit model become: Pr(y > 0) = (x =) = (x ) (13) where = =: This means that we can get a consistent estimate of from the b 3 probit model, and plug this in to get an estimate i of the term for observation i: (xi ) b i = : (14) (xi ) b b In the second step, i is included along withb the other explanatory variables, x, in a regression based on equation (10). It can be shown that estimating this “control b function” model with OLS yields consistent estimates of both and (the regression coe¢ cient in front of the term). The procedure can thus be described as follows Step 1: Use a probit to estimate whether or not y is larger than zero. Use the probit estimates to compute i: Step 2: For positive observations of y, run a regression of yi on xi and i. b Note thatb unlike maximum likelihood, the heckit estimates are not e¢ cient. Fur- b thermore, standard errors have to be corrected for the fact that 1) i is estimated (the generated regressor problem) and that 2) adding i as a regressor results in heteroskedas- b tic errors even if the original error term e is normal and homoskedastic. The second b problem can be …xed by just using robust standard errors, while the …rst problem can either be solved by computing appropriate formulas (this is what Stata does) or by boot- strapping the whole two-step procedure. 1.3 Truncated variables When y is a truncated variable, the appropriate model is: y = y if y > 0 y is not observed if y 6 0 This does not really a¤ect the likelihood function as we can now simply write li( ; ) = 1[yi is missing] log[1 (xi =)] + 1[yi > 0] log[(1=)[(y x )=]] (15) We can also run the exact same heckit procedure as before. The only di¤erence is that instead of doing a probit for whether y is zero or positive, we now do a probit for whether y is missing or positive. 4 2 Generalized tobits and self-selection models Heckman’sprocedure was extremely popular for a while because it can be generalized to many interesting cases beside the tobit where we have a self-selection problem. Take, for instance, the case of a wage equation where we want to estimate the e¤ect of education (x) on log wages (y). The problem is that people who don’twork tend to do so because they have low wages, or more generally because their wage is lower than their reservation wage. This is not, strictly speaking, a tobit model, since factors other than the wage itself determine whether or not the person works (and whether the wage will be observed in the data). This generalized tobit or self-selection model can be written as follows: y = y if y1 > 0 y is not observed if y1 6 0 where, as before: y = x + e; which is the latent or “true” wage equation, but where the participation decision is driven by another latent variable y1 = z + u; (16) where z is the set of variables that determines the participation decision (generally includes x and may include other variables), while u is another normal error term (normalized to have a variance of one) that is presumed to be correlated with e, i.e. eu = cov(e; u) = 0. It can be shown in this case that: 6 E(y x; y > 0) = x + eu(z) = x (17) j 1 6 So the problem is that we have an omitted variable (z) that will be correlated with x, since x is part of z and enters non-linearly in the function (z). In principle, this can be solved once again by applying the correction (heckit) technique. The approach somehow fell out of favour because the way the model is identi…ed is not very compelling unless we have strong reasons to believe that some variables in z do not belong directly in the model, i.e. in the set of x variables. When z = x, the only di¤erence between x and (z) = (x) is that the latter is a non-linear function of x. Unfortunately, when x is also allowed to be replaced by a more ‡exible function (e.g. 5 polynomials in x), it is no longer possible to separately identify the e¤ect of these two terms because of multicollinearity. Note also that this approach is often called a “control function” approach, where the function (z) = x implicitly controls for the selection/endogeneity problem. The 6 regression version of the Hausman test where the …rst stage residual (a function of the instrumental variables) is included in the main regression is also often called a control function regression, where the control function is this time the …rst stage residual. In the context of TSLS, everything is linear and the model is solely identi…ed because of the exclusion restrictions (instrumental variables a¤ect the …rst stage residual but do not directly enter the second stage regression). Most applied researchers tend to think that the same criteria should also apply to Heckman’s model, i.e.

Load more