<<

Truncation and Censoring

Laura Magazzini

[email protected]

Laura Magazzini (@univr.it) Truncation and Censoring 1 / 40 Truncation and censoring Truncation and censoring

Truncation: sample data are drawn from a subset of a larger population of interest . Characteristic of the distribution from which the sample data are drawn . Example: studies of income based on incomes above or below the poverty line (of limited usefulness for inference about the whole population) Censoring: values of the dependent variable in a certain range are all transformed to (or reported at) a single value . Defect in the sample data . Example: in studies of income, people below the poverty line are reported at the poverty line Truncation and censoring introduce similar distortion into conventional statistical results

Laura Magazzini (@univr.it) Truncation and Censoring 2 / 40 Truncation and censoring Truncation Truncation

Aim: infer the caracteristics of a full population from a sample drawn from a restricted population . Example: characteristics of people with income above $100,000 Let Y be a continous random variable with pdf f (y). The conditional distribution of y given y > a (a a constant) is:

f (y) f (y|y > a) = Pr(y > a)

In case of y normally distributed:

1 φ x−µ  f (y|y > a) = σ σ 1 − Φ(α)

a−µ where α = σ

Laura Magazzini (@univr.it) Truncation and Censoring 3 / 40 Truncation and censoring Truncation Moments of truncated distributions

E(Y |y < a) < E(Y ) E(Y |y > a) > E(Y ) V (Y |trunc.) < V (Y )

Laura Magazzini (@univr.it) Truncation and Censoring 4 / 40 Truncation and censoring Truncation Moments of the truncated normal distribution

Let y ∼ N(µ, σ2) and a constant E(y|truncation) = µ + σλ(α) Var(y|truncation) = σ2[1 − δ(α)] .α = (a − µ)/σ .φ (α) is the standard normal density .λ (α) is called inverse :

λ(α) = φ(α)/[1 − Φ(α)] if truncation is y > a λ(α) = −φ(α)/Φ(α) if truncation is y < a

.δ (α) = λ(α)[λ(α) − α], where 0 < δ(α) < 1 for any α

Laura Magazzini (@univr.it) Truncation and Censoring 5 / 40 Truncation and censoring Truncation Example: a truncated log-normal income distribution

From New York Post (1987): “The typical upper affluent American... makes $142,000 per year... The people surveyed had household income of at least $100,000” . Does this tell us anything about the typical American? “... only 2 percent of Americans make the grade” . Degree of truncation in the sample: 98% . The $142,000 is probably quite far from the mean in the full population Assuming lognormally distributed income in the population (log of income has a normal distribution), the information can be employed to deduce the population mean income Let x = income and y = ln x σφ(α) E[y|y > log 100] = µ + 1 − Φ(α) By substituting E[x] = E[ey ] = eµ+σ2/2, we get E[x] = $22, 087 . 1987 Statistical Abstract of the US listed average household income of about $25, 000 (relatively good estimate based on little information!)

Laura Magazzini (@univr.it) Truncation and Censoring 6 / 40 Truncation and censoring Truncation The truncated regression model

∗ 0 2 yi = xi β + i , with i |xi ∼ N(0, σ ) ∗ Unit i is observed only if yi cross a threshold:

 ∗ n.a. if yi ≤ a yi = ∗ ∗ yi if yi > a

∗ 0 0 E[yi |yi > a] = xi β + σλ(αi ), with αi = (a − xi β)/σ

Laura Magazzini (@univr.it) Truncation and Censoring 7 / 40 Truncation and censoring Truncation OLS estimation

OLS of y on x leads to inconsistent estimates ∗ ∗ 0 . The model is yi |yi > a = E(yi |yi > a) + i = xi β + σλ(αi ) + i . By construction, the error term is heteroskedastic . Omitted variable bias (λi is not included in the regression) . In applications, it is usually found that the OLS estimates are biased toward zero: the marginal effect in the subpopulation is:

∗ ∂E[yi |yi > a] ∂αi = β + σ(dλ(αi )/dαi ) ∂xi ∂xi = ...

= β(1 − δ(αi ))

– Since 0 < δ(αi ) < 1, the marginal effect in the subpopulation is less than the corresponding coefficient

Laura Magazzini (@univr.it) Truncation and Censoring 8 / 40 Truncation and censoring Truncation Maximum likelihood estimation

Under the normality assumption, MLE can be obtained that provides a consistent estimator . For each observation:

 y −x0β  1 φ i i ∗ σ σ f (yi |yi > a) = 1 − Φ(αi )

0 a−xi β with αi = σ . The log-likelihood can be written as

N   0  N   0  X yi − x β X a − x β log L = log σ−1φ i − log 1 − Φ i σ σ i=1 i=1

Laura Magazzini (@univr.it) Truncation and Censoring 9 / 40 Truncation and censoring Truncation Example: simulated data If y ∗ is fully observed, OLS can be applied

Laura Magazzini (@univr.it) Truncation and Censoring 10 / 40 Truncation and censoring Truncation Example: simulated data However, only y ∗ > a is included in the sample

Laura Magazzini (@univr.it) Truncation and Censoring 11 / 40 Truncation and censoring Truncation Example: simulated data OLS on the observed sample is biased

Laura Magazzini (@univr.it) Truncation and Censoring 12 / 40 Truncation and censoring Truncation Example: simulated data MLE (truncreg) allows to get a consistent estimate of β

Laura Magazzini (@univr.it) Truncation and Censoring 13 / 40 Truncation and censoring Censored data Censored data

Censored regression models generally apply when the variable to be explained is partly continuous but has positive probability mass at one or more points Assume there is a variable with quantitative meaning y ∗ and we are interested in E[y ∗|x] If y ∗ and x were observed for everyone in the population: standard regression methods (ordinary or nonlinear least squares) can be applied In the case of censored data, y ∗ is not observable for part of the population . Conventional regression methods fail to account for the qualitative difference between limit (censored) and nonlimit (continuous) observations . Top coding / corner solution outcome

Laura Magazzini (@univr.it) Truncation and Censoring 14 / 40 Truncation and censoring Censored data Top coding: example Data generating process

Let wealth∗ denote actual family wealth, measured in thousands of dollars Suppose that wealth∗ follows the linear regression model E[wealth∗|x] = x0β Censored data: we observe wealth only when wealth∗ > 200 . When wealth∗ is smaller than 200 we know that it is, but we do not know the actual value of wealth Therefore observed wealth can be written as

wealth = max(wealth∗, 200)

Laura Magazzini (@univr.it) Truncation and Censoring 15 / 40 Truncation and censoring Censored data Top coding: example Estimation of β

We assume that wealth∗ given x has a homoskedastic normal distribution wealth∗ = x0β + , |x ∼ N(0, σ2) Recorded wealth is: wealth = max(wealth∗, 200) = max(x0β + , 200) β is estimated via maximum likelihood using a mixture of discrete and continuous distributions (details later...)

Laura Magazzini (@univr.it) Truncation and Censoring 16 / 40 Truncation and censoring Censored data Example: seat demanded and ticket sold

Laura Magazzini (@univr.it) Truncation and Censoring 17 / 40 Truncation and censoring Censored data The censored normal distribution

y ∗ ∼ N(µ, σ2) Observed data are censored in a = 0:  y = 0 if y ∗ ≤ 0 y = y ∗ if y ∗ > 0

The distribution is a mixture of discrete and continuous distribution . If y ∗ ≤ 0: f (y) = Pr(y = 0) = Pr(y ∗ ≤ 0) = Φ(−µ/σ) = 1 − Φ(µ/σ) ∗ y−µ  . If y > 0: f (y) = φ σ  0−µ  E[y] = 0 × Pr(y = 0) + E[y|y > 0] × Pr(y > 0) = (µ + σλ)Φ σ with λ = φ/Φ

Laura Magazzini (@univr.it) Truncation and Censoring 18 / 40 Truncation and censoring Censored data The censored regression model (Tobin, 1958)

Let y ∗ be a continuous variable (latent variable):

∗ 0 yi = xi β + i ,

where |x ∼ N(0, σ2) The observed data y are

 ∗ ∗ 0 if yi ≤ 0 yi = max(0, yi ) = ∗ ∗ yi if yi > 0 Why not OLS? Why not OLS on positive y ∗?

Laura Magazzini (@univr.it) Truncation and Censoring 19 / 40 Truncation and censoring Censored data MLE estimation

As we assume |x ∼ N(0, σ2), the likelihood function can be written The distribution is a mixture of discrete and continuous distribution

. A positive probability is assigned to the observations yi = 0:

∗ Pr(yi = 0|xi ) = Pr(yi ≤ 0|xi ) 0 = Pr(xi β + i ≤ 0) 0 = Pr(i ≤ −xi β) 0 = 1 − Pr(i < xi β) x 0β  = 1 − Φ i σ

 0  ∗ yi −xi β . For yi > 0: f (yi ) = φ σ

Laura Magazzini (@univr.it) Truncation and Censoring 20 / 40 Truncation and censoring Censored data MLE estimation

The likelihood can be written as:   0   0  Y x β Y 1 yi − x β L(β, σ2|y) = 1 − Φ i φ i σ σ σ yi =0 yi >0 2  y −x0β    0  − 1 i i Y x β Y 1 2 σ = 1 − Φ i √ e σ 2πσ2 yi =0 yi >0 In the case of censored data, β estimated from the Tobit model can be employed to study the effect of x on E[y ∗|x]

Laura Magazzini (@univr.it) Truncation and Censoring 21 / 40 Truncation and censoring Censored data Example: simulated data If y ∗ is fully observed, OLS can be applied

Laura Magazzini (@univr.it) Truncation and Censoring 22 / 40 Truncation and censoring Censored data Example: simulated data However, if y ∗ ≤ a, data are recorded as a

Laura Magazzini (@univr.it) Truncation and Censoring 23 / 40 Truncation and censoring Censored data Example: simulated data OLS on the observed sample is biased

Laura Magazzini (@univr.it) Truncation and Censoring 24 / 40 Truncation and censoring Censored data Example: simulated data MLE (tobit) allows to get a consistent estimate of β

Laura Magazzini (@univr.it) Truncation and Censoring 25 / 40 Truncation and censoring Censored data Corner solution outcomes

Still labeled “censored regression models” Pioneer work by Tobin (1958): household purchase of durable goods Let y be an observable choice or outcome describing some economic agent, such as an individual or a firm, with the following characteristics: y takes on the value zero with positive probability but is a continuous random variable over strictly positive values . Examples: amount of life insurance coverage chosen by an individual, family contributions to an individual retirement account, and firm expenditures on research and development . We can imagine economic agents solving an optimization problem, and for some agents the optimal choice will be the corner solution, y = 0 . The issue here is not data observability, rather individual behaviour . We are interested in features of the distribution of y given x, such as E[y|x] and Pr(y = 0|x)

Laura Magazzini (@univr.it) Truncation and Censoring 26 / 40 Truncation and censoring Censored data Marginal effect in the tobit model

In the case of corner solution outcome, the estimated β are not sufficient since E[y|x] and E[y|x, y > 0] depend on β in a non-linear way ∂E[y |x ] x0β  i i = Φ i β ∂xi σ

∂E[yi |xi ] ∂E[yi |xi ,yi >0] ∂ Pr[yi >0] = Pr(yi > 0) + E[yi |xi , yi > 0] ∂xi ∂xi ∂xi A change in xi has two effects: ∗ (1) It affects the conditional mean of yi in the positive part of the distribution (2) It affects the probability that the observation will fall in the positive part of the distribution

Laura Magazzini (@univr.it) Truncation and Censoring 27 / 40 Truncation and censoring Censored data Some issues in specification

Heteroschedasticity . MLE is inconsistent . However the problem can be approached directly and σi considered in the likelihood function instead of σ. Specification of a particular model for σi provides the empirical model for estimation Misspecification of Pr(y ∗ < 0) . In the tobit model, a variable that increases the probability of an observation being a non-limit observation also increases the mean of the variable - Example: loss due to fire in buildings . A more general model has been devised involving a decision equation and a regression equation for nonlimit observations Non-normality . MLE is inconsistent . Research is ongoing both on alternative estimators and on methods for testing this type of misspecification

Laura Magazzini (@univr.it) Truncation and Censoring 28 / 40 Truncation and censoring Sample selection Sample selection

What if observation is driven by a different process? (1) Data observability . Saving function (in the population): saving = β0 + β1income + β2age + β3married + β4kids + u . Survey data only includes families whose household head was 45 years of age or older (2) Individual behaviour (Boyes, Hoffman, Low, 1989; Greene, 1992)

. y1 = 1 if individual i defaults on a loan/credit card, 0 otherwise . y2 = 1 if individual i is granted a loan/credit card, 0 otherwise . For a given individual, y1 is not observed unless y2 equals 1

Laura Magazzini (@univr.it) Truncation and Censoring 29 / 40 Truncation and censoring Sample selection Sample selection / incidental truncation

Let y and z have a bivariate distribution with correlation ρ We are interested in the distribution of y given that another variable z exceeds a particular value . Intuition: if y and z are positively correlated then the truncation of z should push the distribution of y to the right The truncated joint distribution is

f (y, z) f (y, z|z > a) = Pr(z > a)

To obtain the incidentally truncated marginal density of y, we should integrate z out of this expression

Laura Magazzini (@univr.it) Truncation and Censoring 30 / 40 Truncation and censoring Sample selection Moment of the incidentally truncated bivariate normal distribution

Let y and z have a bivariate normal distribution with means µy and µz , standard deviations σy and σz , and correlation ρ

E[y|z > a] = µy + ρσy λ(αz ) 2 2 V [y|z > a] = σy [1 − ρ δ(αz )] .α z = (a − µz )/σz .λ (αz ) = φ(αz )/[1 − Φ(αz )] .δ (αz ) = λ(αz )[λ(αz ) − αz ]

If the truncation is z < a, then λ(αz ) = −φ(αz )/Φ(αz )

Laura Magazzini (@univr.it) Truncation and Censoring 31 / 40 Truncation and censoring Sample selection Example: A model of labor supply

Consider a population of women where only a subsample is engaged in market employment We are interested in identifying the determinants of the labor supply for all women A simple model of female labor supply consists of 2 equations (1) Wage equation: the difference between a person’s market wage and her reservation wage, as a function of characteristics such as age, education, number of children, ... plus unobservables (2) Hours equation: The desired number of labor hours supplied depends on the wage, home characteristics (e.g. presence of small children), marital status, ... plus unobservable Truncation: Equation 2 describes the desired hours, but an actual figure is observed only if the individual is working, i.e. when the market wage exceeds the reservation wage The hours variable is incidentally truncated

Laura Magazzini (@univr.it) Truncation and Censoring 32 / 40 Truncation and censoring Sample selection Example: A model of labor supply When OLS on the working sample?

Assume working women are chosen randomly If the working subsample has similar endowments of characteristics (both obs. & unobs.) as the nonworking sample, OLS is an option BUT the decision to work is not random: the working and nonworking sample potentially have different characteristics . When the relationship is purely trough observables, appropriate conditioning variables can be included in the relevant equation . If unobservable characteristics affecting the work decision are correlated with the unobservable characteristics affecting wage, then a relationship is determined that cannot be tackle by including appropriate controls . A bias is induced due to “sample selection”

Laura Magazzini (@univr.it) Truncation and Censoring 33 / 40 Truncation and censoring Sample selection Regression in a model of selection (1)

Equation that determines sample selection

∗ 0 zi = wi γ + ui

The equation of primary interest is

0 yi = xi β + i

∗ where yi is observed only when zi is greater than zero (otherwise data are not available)

. This model is closely related to the Tobit model, although it is less restrictive: the parameters explaining the censoring are not constrained to equal those explaining the variation in the observed dependent variable. For this reason the model is also known as Tobit type two.

Laura Magazzini (@univr.it) Truncation and Censoring 34 / 40 Truncation and censoring Sample selection Regression in a model of selection (2)

If ui and i have a bivariate normal distribution with zero mean and correlation ρ,

∗ E[yi |yi is observed] = E[yi |zi > 0] 0 = E[yi |ui > −wi γ] 0 0 = xi β + E[i |ui > −wi γ] 0 = xi β + ρσλi (αu)

0 where αz = −wi γ/σu and λ(αu) = φ(αu)/Φ(αu) So, the regression model can be written as

∗ ∗ yi |zi > 0 = E[yi |zi > 0] + υi 0 = xi β + ρσλi (αu) + υi

Laura Magazzini (@univr.it) Truncation and Censoring 35 / 40 Truncation and censoring Sample selection Regression in a model of selection (3)

∗ 0 E[yi |zi > 0] = xi β + ρσλi (αu) OLS regression using the observed data will lead to inconsistent estimates (omitted variable bias) The marginal effect of the regressors on yi in the observed sample consists of two components: . Direct effect on the mean of yi (β) ∗ . In addition, if the variable appears in the probability that zi is positive, then it will influence yi through its presence in λi ∗   ∂E[yi |zi > 0] ρσ = βk + γk δi (αu) ∂xik σu ∗ Most often zi is not observed, rather we can infer its sign but not its magnitude . Since there is no information on the scale of z∗, the disturbance 2 variance in the selection equation cannot be estimated (we let σu = 1)

Laura Magazzini (@univr.it) Truncation and Censoring 36 / 40 Truncation and censoring Sample selection Regression in a model of selection (4)

Selection mechanisms

∗ 0 zi = wi γ + ui ,

∗ where we observe zi = 1 if zi > 0 and 0 otherwise. 0 . Pr(zi = 1|wi ) = Φ(wi γ) 0 . Pr(zi = 0|wi ) = 1 − Φ(wi γ)

Regression model 0 yi = xi β + i ,

where yi is observed only when zi is equal to one (otherwise data are not available)

. (ui , i ) ∼ bivariate normal[0, 0, 1, σ, ρ]

Laura Magazzini (@univr.it) Truncation and Censoring 37 / 40 Truncation and censoring Sample selection Estimation

Least squares using the observed data produces incosistent estimates of β (omitted variable) Least squares regression of y on x and λ would be a consistent estimator

. However, even if λi were observed, OLS would be inefficient: υi are heteroskedastic Maximum likelihood estimation can be applied Heckman (1979) proposed a two-step procedure

Laura Magazzini (@univr.it) Truncation and Censoring 38 / 40 Truncation and censoring Sample selection Maximum likelihood estimation

The log likelihood for observation i, log Li = li , can be written as: . If yi is not observed 0 li = log Φ(−wi γ)

. If yi is observed

0 0 !  0  √ wi γ + (yi − xi β)ρ/σ 1 yi − xi β li = log Φ p − − log( 2πσ) 1 − ρ2 2 σ

σ and ρ are not directly estimated (they have to be greater than 0)

Directly estimated are log σ and atanhρ: 1 1 + ρ atanhρ = log 2 1 − ρ

Estimation would be simplified if ρ = 0

Laura Magazzini (@univr.it) Truncation and Censoring 39 / 40 Truncation and censoring Sample selection Two-step procedure Heckman (1979)

∗ ∗ yi |zi > 0 = E[yi |zi > 0] + υi 0 = xi β + ρσλi (αu) + υi

1 Estimate the probit equation by MLE to obtain estimates of γ. For each observation in the selected sample, compute λˆi (inverse Mills ratio)

2 Estimate β and βλ = ρσ by least squares regression of y on x and λˆ

Laura Magazzini (@univr.it) Truncation and Censoring 40 / 40 Truncation and censoring Sample selection Estimators of the variance and standard errors

Second step standard errors need to be adjusted to account for the first step estimation

The estimation of σ needs to be adjusted: . At each observation, the true conditional variance of the disturbance 2 2 2 would be σi = σ (1 − ρ δi ) 2 . A consistent estimator of σ is given by: 1 σˆ2 = e0e + δ¯ˆb2  n λ To test hypothesis, an estimate of the asymptotic covariance matrix of the coefficients (including βλ) is needed . Two problems arise: (1) the disturbance terms υi is heteroskedastic; (2) there are unknown parameters in λi . Formulas are rather cumbersome, but can be calculated using the 2 matrix of independent variables, the sample estimates of σ and ρ, and the assumed known values of λi and δi

Laura Magazzini (@univr.it) Truncation and Censoring 41 / 40 Truncation and censoring Sample selection Two-step procedure Discussion

Identification: exclusion restriction 0 . Although the inverse Mills ration is non linear in the single index wi γ, the function mapping this index into the inverse Mills ratio is linear for certain ranges of the index . Accordingly, the inclusion of additional variables in wi in the first step can be important for identification of the second step estimates . In real world, there are few cadidates for simultaneous inclusion in wi and exclusion from xi Inclusion of the inverse Mills ratio into the equation of interest is driven by the normality assumption . Recent research includes specific attempts to move away from the normality assumption:

∗ 0 0 yi |zi > 0 = xi β + µ(wi γ) + υi

0 where µ(wi γ) is called “selectivity correction”

Laura Magazzini (@univr.it) Truncation and Censoring 42 / 40 Truncation and censoring Sample selection Selection in qualitative response models

The problem of sample selection has been modeled in other settings besides the linear regression model Binary choice model have been considered, but also count data models For example in the case of the Poisson model:

. yi |i ∼ Poisson(λi ) 0 . log λi = xi β + i ∗ 0 . (yi , xi ) are only observed when zi = 1, where zi = wi γ + ui and zi = 1 ∗ if zi > 0, 0 otherwise . Assume that (i , ui ) have a bivariate normal distribution with non-zero correlation . Selection affects the mean (and the variance) of yi and, in the observed data, yi no longer has a Poisson distribution

Laura Magazzini (@univr.it) Truncation and Censoring 43 / 40