Binary dependent variable: and

Amine Ouazad Ass. Prof. of Economics Outline

1. Problemo 2. Probit/Logit Framework 3. Structural interpretation 4. Interpreting results 5. Testing assumptions 6. Further remarks PROBLEMO: OLS WITH A BINARY DEPENDENT VARIABLE Problemos • Consider the estimation of the of smoking y = x’b + e, where y = 0,1 . x a set of covariates.

• We know that OLS is consistent, asympt. normal, and unbiased. • However: • The can be outside 0,1. • E(y|x) =x’b is the probability of smoking given the characteristics of the individual, E(y|x) = P(y=1|x).

• The residuals are not normal for a finite sample. • Conditional on x, the residual takes one of two values e = 1 - x’b or e = - x’b.

• Under A6, the residuals would be N(0,s2), but that is not possible given y = 0,1.

• The residuals are heteroskedastic. • Since y is binary, Var(y|x) = x’b(1-x’b) Problemo 1: Predictions

• In-sample predictions may be outside 0,1 • Smoking = a + b Age + c Income + e • regress smoking age income • predict smoking , xb • sum smoking

• Out-of-sample predictions may be outside 0,1 • Smoking = a + b Age + c Income + e • regress smoking age income • use another_dataset.dta • predict smoking , xb • sum smoking_predicted Problemo 2: Normality of the residuals

• Recap: Normality of the residuals is needed for the validity of confidence intervals, test , when “far from the asymptotics” (i.e. small sample size).

• But a normally distributed residuals cannot make y=0,1. • Hence, in principle, confidence intervals and test statistics are incorrect if using an OLS regression.

• regress smoking age income • predict resid, resid • hist resid. • See next page. • Non normal residuals. Conditional on x, e can take only two values. • Considering that x has a distribution, that can give a double-peaked distribution for the residuals such as this one. A6 is obviously violated. Test of normality

• Using the third and fourth moments of the residuals. If they are normal, the should be 3 and the should be 0. • regress y x • predict epsilon, resid • sum epsilon, detail • hist epsilon • sktest epsilon PROBIT/LOGIT FRAMEWORK Individuals’ preferences

• Intuition is that individuals are making a . • y* = U(smoking)-U(not smoking) = x’b + e • The difference in utilities is the benefit-cost analysis. • The cost and benefit is unobserved, but the choice is ultimately observed. • The cost and benefit is a continuous variable, so e can be normally distributed.

• Then P(y=1)=P(smoking) = P(y*>0)=P(x’b+e>0) = P(e>-x’b) = F(x’b) • With a symmetric distribution P(e>-x’b)=P(e

• Probit: the residual is normally distributed, with 1. • Variance is fixed, more on this later. • F(x) is the integral of the . • Logit: the residual has a . • F(x) = ex/(1+ex). • Choice of one versus the other makes little practical difference (and should make no practical difference, otherwise your model is not robust).

• Difficulty in the is that the cdf has no closed form expression. Likelihood of the model

• We observe the choices yi , i=1,2,...,N, and the characteristics xi. • The likelihood of an observation yi,xi is: • L(yi,xi;b) = P(yi=1|xi) if yi = 1 • L(yi,xi;b) = P(yi=0|xi) if yi = 0 • Combine: L = P(yi=1|xi)^yi P(yi=0|xi)^(1-yi) • In logs, log L = yi log P(yi=1|xi) + (1-yi) log P(yi=0|xi) Identification and estimation

• See Greene. The likelihood function has a single global maximum and is globally concave INTERPRETING RESULTS Variance is not identified

• The likelihood was maximized over the coefficient vector b only, because we fixed the value of the variance. • The variance has to be fixed, by convention to 1 for probit, to π2/3 for logit. • Indeed, consider a model where the variance of the residual is 4, and coefficients inflated by 2. • The model generates the same probability of smoking as the original model. • This also tells us that the absolute value of the coefficients have little interpretation. Only the odds ratios and the marginal effects have an interpretation. Identification

• By the same reasoning, the value of a coefficient is not identified. • The sign of a coefficient is identified. • The ratio of two coefficients is identified. • Implications for the interpretation of the logit coefficients? Marginal effects

• The marginal effect of a covariate x on the probability that y=1 is easiest to read. • It measures the effect of a marginal increase in x on the probability of y=1. • This marginal effect is identified. It does not depend on the particular scaling of the coefficients. • In Stata it is computed by mfx after performing the logit/probit regression. Marginal Effects

∂P(y =1 x) f (x' ) • Formula: = βk β ∂xk • Marginal effects do not depend on a particular scaling of β. • Their value depends on the point at which the marginal effect is taken… (different from OLS A1, the model is nonlinear). • Either taken at the mean of the covariates, or the mean of the marginal effects is taken. • By default, mfx calculates the marginal effects or elasticities at the means of the independent variables. STRUCTURAL INTERPRETATION: RANDOM UTILITY MODEL Structural interpretation (1/2) • y1* = U(smoking) = x’b1 + e1 • y0* = U(not smoking) = x’b0 + e0 • take the difference y* = y1* - y0* = x’(b1-b0) + e1-e0 • write b = b1-b0 , and e = e1-e0. • The coefficient is the impact of the covariate on the relative preference for smoking. Structural Interpretation (2/2) • If e1 and e0 are normally distributed, then e is normally distributed. The estimation of the model is done via probit. • If e1 and e0 are extreme-value distributed, then e is logistically distributed, the estimation of the model is done via logit. • Extreme value distribution: TESTING ASSUMPTIONS Testing assumptions

• Test for the significance of a coefficient: z stat. • Linear and non-linear constraints: Likelihood ratio, Lagrange multiplier, Wald statistic. Goodness of fit • McFadden’s Pseudo R2: ln L PseudoR2 =1− 1 ln L0 • Reported by Stata. • Equals 0 if the log likelihood of the model is equal to the log likelihood of the model with only a constant. • Equals 1 when ln L equals 0, i.e. when L equals 1, the model perfectly predicts outcomes. FURTHER REMARKS Further remarks

• Endogeneity: since the mean of the residual of the latent equation is assumed to be independent from the covariates, any correlation is an issue. Same reasoning as for OLS. • Direction and magnitude of biases are more complicated, but use the reasonings of econometrics A as a heuristic. Further remarks

• Measurement error can also bias estimates although there is no corresponding theorem on attenuation bias. The direction and magnitude of the bias is unknown. • Fixed effects cannot typically be consistently estimated as T is fixed with N infinite as there is no within- or first- differenced transformation that leads to a specification without the fixed effects.