<<

Econometrics I Midterm Examination Fall 2004 Answer Key

Please answer all of the questions and show all of your work. If you think that a question is ambiguous, clearly state how you interpret it before providing an answer.

1. (28 points) Consider the regression function

β2 y = β0 + β1x + ε, where x is a scalar individual characteristic that is always positive, ε is a random variable 2 that is independently and identically distributed (i.i.d.) in the population as a N(0,σε), and β =(β0 β1 β2)0 is a vector of unknown parameters. You have access to a random sample of N observations containing information on (yx) drawn from this population.

2 1. Define the nonlinear least squares of β and σε for this problem. The disturbance term is homoskedastic, so we define

N ˆ ˆ ˆ β2 2 (β0 β1 β2)0 =argmin (yi β0 β1xi ) β − − i=1 X fortheestimatoroftheβ vector. A consistent estimator of σ2 is given by

N ˆ 2 1 β2 2 σˆ = N − (yi βˆ βˆ x ) . − 0 − 1 i i=1 X 2. Are all of the parameters identified, or are some restrictions on the parameter space required to ensure identification of the model? 0 If β2 =0, then xi =1for all i and the NLLS estimator of β0 + β1 is the sample mean (they cannot be individual identified). Similarly, if β1 =0, then β2 is not identified. Thus identification conditions are that (1) the parameter space does not include β1 =0 and β2 =0and (2) nonconstancy of xi. 3. Derive the log and define the maximum likelihood of the model parameters. How does this estimator compare to the one you definedin(a)? The log likelihood function is given by

N β2 ln L = N ln σ + ln φ((yi β β x )/σ), − − 0 − 1 i i=1 X where φ is the standard normal p.d.f. The m.l. estimator of the conditional mean for the normal minimizes the sum of squared deviations, jsut as was the case for the least squares estimator. In this case the conditional mean function happened to be nonlinear in the parameters, but the “quadratic loss” criteria is the same. Since the estimators of the conditional mean functions are identical, it is not surprising that the estimator of the parameter is also identical. 4. Assume that only the sign of y is observable, i.e., define

1 iff y>0 d = . 0 iff y 0 ½ ≤ Write down the log likelihood function corresponding to this model, and define the max- imum likelihood estimators of the model parameters. Which parameters are identified given this information, and under what conditions? In this case, we get a nonlinear (in the parameters) . The probability of d =1is simply β + β xβ2 Φ( 0 1 i ), (0.1) σ so the log likelihood function is

N β2 β2 β0 + β1xi β0 + β1xi ln L = di ln Φ( )+(1 di)ln(1 Φ( )) . { σ − − σ } i=1 X From inspection of the log likelihood, we see that as in the linear in the parameters case, only the functions β0/σ and β1/σ are identified. Given these 2 estimable functions of the parameters, the parameter β2 is identified.Westillrequiretheidentification conditions discussed in part b.

2 2. (42 points) A dependent variable y is deterministically related to a scalar x as follows:

y = β0 + β1x.

1. Assume that in the population β1 is a constant while β0 is independently and identically distributed with mean µ and variance σ2 . You have access to a random sample of β0 β0 N observations containing the information yi,xi i=1. Derive unbiased estimators for all of the unknown model parameters. { } We simply rewrite the regression function as

y = µ + β x + ε, β0 1 where ε = β µ . Then by the assumptions on β ,εis independent of x, so OLS 0 β0 0 yields unbiased− estimators for µ and β . The squared OLS residuals divided by N 1 β0 1 − is an unbiased estimator of σ2 . β0 2. Consider a generalization of the model in which β1 is treated as a random variable in the population, which is independent of β .β is assumed to be i.i.d., with mean µ 0 1 β1 and variance σ2 . Define consistent estimators of all unknown model parameters, if they β1 exist. The model now becomes

y = µ + µ x +(β µ )x +(β µ ), β0 β1 1 − β1 0 − β0 so the “new” disturbance term is

u =(β µ )x +(β µ ). 1 − β1 0 − β0 By definition we have that E(u x)=0for all x. Then OLS estimators of the conditional mean function are unbiased for| µ and µ . Define the residuals from the first stage β0 β1 OLSregressionbyri. Then form the estimator

N 1 2 2 2 γˆ =argminN − (r γ γ x ) . i − 1 − 2 i i=1 X Then γˆ is a consistent estimator of σ2 and γˆ is a consistent estimator of σ2 . 1 β0 2 β1 3. Given your response to (b), is it possible to define more efficient estimators for µ and β0 µ ?Ifso,define these estimators. If not, argue why the estimators definedin(b)are β1 efficient given the sample information and the model. We can use the consistent estimates of σ2 and σ2 to form consistent estimates of the β0 β1 covariance matrix of the disturbances, and then form the Feasible GLS estimator. Define

2 .5 wˆi =(ˆγ1 +ˆγ2xi ) . Then the FLGS estimates of the conditional mean function are given by

N 2 (yi µβ µβ xi) (ˆµ µˆ )=argmin − 0 − 1 . β0 β1 wˆ2 i=1 i X These are (asymptotically) efficient in the class of LS estimators.

3 4. Describe tests, formal and/or informal, that would allow you to investigate whether β1 was a constant in the population (i.e., σ2 =0). β1 We are looking for systematic heteroskedasticity, in that the conditional variance is posited to be a linear function of x2. We could plot the squared residuals as a func- 2 tion of xi to see if there was an evidence of this relationship. More formally, we can estimate the model

2 2 ri = a + bxi + ξi.

The OLS estimators of a and b are consistent under the null. The random variable ξi is heteroskedastic under the null. In large samples, we can compute the standard errors of the regression estimates using the Huber-Eiker-White method. Then ˆb/(s.e.(ˆb)) is distributed as a N(0, 1) under the null.

5. Say that we are willing to assume that both β0 and β1 are independently distributed as N(µ ,σ2 ),i=0, 1. Write down the log likelihood function for this model, and βi βi determine whether all parameters are identified. The log likelihood function is given by

N y µ µ x 2 2 2 i β0 β1 i ln L = .5ln(σβ + σβ xi )+lnφ( − − ) {− 0 1 2 2 2 } i=1 σ + σ x X β0 β1 i q Identification issues are essentially the same as in the regression case. The main differ- ence is that the estimates are all computed in one-step instead of two (or three, when we compute the FLGS estimator). Identification could be verified by computing the matrix of first partials and confirming that it is of full column rank. 6. If you had to choose between using the estimators you defined in (c) or the maximum likelihood estimator defined in (e), which would you prefer? On what factors would your choice depend? If the normality assumption is correct, the ML estimator is consistent and asymptotically efficient. Then if the sample is large and one has some faith in the normality assumption, the ML estimator may be appropriate. The LS estimators are consistent under more general conditions than normality. Since differences in the asymptotic standard errors are often neglible, perhaps the LS estimators may be preferable even in large samples. In small samples, use of the LS estimators would pretty clearly be indicated.

4 3. (30 points) Unemployed individuals look for jobs according to the model of continuous time search in stationary environments presented in class. We showed that for an individual facing labor market parameters θ = ρbF λη), their optimal policy was to accept any wage offer { that was greater than w∗, where this constant was the solution to λ w∗ = b + (w w∗)dF (w), ρ + η − Zw∗ and where

b is the utility flow in the unemployment state λ is the rate of meeting potential employers η istherateofbeingdismissedatajob ρ is the instantaneous discount rate F is the wage offer distribution.

Assume you have access to a random sample of N individuals which contains the following information for each sample member:

tu the length of a complete unemployment spell

te the length of the complete employment spell that follows the unemployment spell w the wage rate associated with the employment spell (which is constant over the spell).

Further assume that the distribution of wage offers is negative exponential, i.e.,

F (w)=1 exp( aw),w>0,α>0, − − f(w)=α exp( αw), − where F and f are the c.d.f. and p.d.f., respectively.

1. Write down the log likelihood function for the sample. We observe one unemployment spell followed by an employment spell with a wage obser- vation. Both duration distributions are negative exponential, with the parameter of the ˜ unemployment duration distribution being given by hu = λF (w∗), and the parameter of the employment duration distribution given by he = η. The likelihood function is given by

N ln L = ln(hu exp( hutu(i))) + ln(he exp( hete(i))) { − − i=1 X α exp( αw(i)) +ln( − ) exp( αw ) } − ∗ N = N ln hu hu tu(i)+N ln η − i=1 X N N η te(i)+N ln α α (wi w∗). − − − i=1 i=1 X X

5 2. Assuming that the discount rate (ρ) is known, are all of the parameters identified? Describe how you obtain maximum likelihood estimators for the identified parameters. Is the MLE consistent?

Following Flinn and Heckman, a consistent estimator of the reservation wage w∗ is the smallest accepted wage in the sample. Given this ML estimator (ˆw∗), wecanobtainthe MLestimatorofthewageoffer distribution parameter as the inverse of the sample mean N of wi wˆ∗ i=1. Given this consistent estimator of α, we can form an ML estimator of λ as{ follows.− } First find the ML estimator

N hˆu = N/ tu(i). i=1 X Then since hu = λ exp( αw∗), − the ML estimator of λ is λˆ = hˆu exp(ˆαwˆ∗). The maximum likelihood estimator of η is simply the inverse of the sample mean of the employment spells. Finally, using all of these point estimates and a known value of ρ, we can solve the implicit function determining w∗ for the implied value of b. Then all parameters are identified and we have efficient estimates of each of them. 3. You suspect that there are two “types” of individuals in the population, who differ solely in terms of their utility flow in the unemployment state, b - that is, all other parameter values they face are the same. Let the proportion of type 1 individuals in the population be given by π. Then the probability distribution of types is given by

Type Utility Flow in Unemployment Population Proportion 1 b1 π . 2 b2 1 π − Write down the log likelihood function for this model. Are all parameters (which now include b1, b2, and π) identified? Describe how would you obtain maximum likelihood estimates of the identified model parameters. The loglikelihood function is the same as the above except for the fact that it is “averaged” over the two types. The likelihood contribution for a given individual is

L(tu(i),te(i),wi)=πL(tu(i),te(i),wi; b1)+(1 π)L(tu(i),te(i),wi; b2), − and the log likelihood for the sample is the sum of the logs of these terms. Instead of estimating b1 and b2 directly, we would estimate the two critical values w1∗ and w2∗ and the parameter π. Without loss of generality, let’s say that b1 is the low utility of unemployment value; in this case we know from the theory that w1∗

6 be observed in the unemployment duration distribution. (In fact, this could serve as a simple test for whether unobserved heterogeneity of the type postulated exists.) Then negative duration dependence in the unemployment duration hazard and the non-negative exponentional shape of the wage offer distribution will serve to identify w2∗ and π. The other parameters α, η, and λ are identified as before. Given these parameter estimates, go back to the functional equations and solve each (using wˆ1∗ and wˆ2∗ respectively) for consistent estimates of b1 and b2.

7