Truncation and Censoring
Total Page:16
File Type:pdf, Size:1020Kb
Truncation and Censoring Laura Magazzini [email protected] Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation and censoring Truncation: sample data are drawn from a subset of a larger population of interest . Characteristic of the distribution from which the sample data are drawn . Example: studies of income based on incomes above or below the poverty line (of limited usefulness for inference about the whole population) Censoring: values of the dependent variable in a certain range are all transformed to (or reported at) a single value . Defect in the sample data . Example: in studies of income, people below the poverty line are reported at the poverty line Truncation and censoring introduce similar distortion into conventional statistical results Laura Magazzini (@univr.it) Truncation and Censoring 2 / 35 Truncation and censoring Truncation Truncation Aim: infer the caracteristics of a full population from a sample drawn from a restricted population . Example: characteristics of people with income above $100,000 Let Y be a continous random variable with pdf f (y). The conditional distribution of y given y > a (a a constant) is: f (y) f (yjy > a) = Pr(y > a) In case of y normally distributed: 1 φ x−µ f (yjy > a) = σ σ 1 − Φ(α) a−µ where α = σ Laura Magazzini (@univr.it) Truncation and Censoring 3 / 35 Truncation and censoring Truncation Moments of truncated distributions E(Y jy < a) < E(Y ) E(Y jy > a) > E(Y ) V (Y jtrunc:) < V (Y ) Laura Magazzini (@univr.it) Truncation and Censoring 4 / 35 Truncation and censoring Truncation Moments of the truncated normal distribution Let y ∼ N(µ, σ2) and a constant E(yjtruncation) = µ + σλ(α) Var(yjtruncation) = σ2[1 − δ(α)] .α = (a − µ)/σ .φ (α) is the standard normal density .λ (α) is called inverse Mills ratio: λ(α) = φ(α)=[1 − Φ(α)] if truncation is y > a λ(α) = −φ(α)=Φ(α) if truncation is y < a .δ (α) = λ(α)[λ(α) − α], where 0 < δ(α) < 1 for any α Laura Magazzini (@univr.it) Truncation and Censoring 5 / 35 Truncation and censoring Truncation Example: a truncated log-normal income distribution From New York Post (1987): \The typical upper affluent American... makes $142,000 per year... The people surveyed had household income of at least $100,000" . Does this tell us anything about the typical American? \... only 2 percent of Americans make the grade" . Degree of truncation in the sample: 98% . The $142,000 is probably quite far from the mean in the full population Assuming lognormally distributed income in the population (log of income has a normal distribution), the information can be employed to deduce the population mean income Let x = income and y = ln x σφ(α) E[yjy > log 100] = µ + 1 − Φ(α) By substituting E[x] = E[ey ] = eµ+σ2=2, we get E[x] = $22; 087 . 1987 Statistical Abstract of the US listed average household income of about $25; 000 (relatively good estimate based on little information!) Laura Magazzini (@univr.it) Truncation and Censoring 6 / 35 Truncation and censoring Truncation The truncated regression model ∗ 0 2 yi = xi β + i , i jxi ∼ N(0; σ ) ∗ Unit i is observed only if yi cross a threshold: ∗ n:a: if yi ≤ a yi = ∗ ∗ yi if yi > a ∗ 0 0 E[yi jyi > a] = xi β + σλ(αi ), with αi = (a − xi β)/σ The marginal effect in the subpopulation is: ∗ @E[yi jyi > a] @αi = β + σ(dλ(αi )=dαi ) @xi @xi = ::: = β(1 − δ(αi )) . Since 0 < δ(αi ) < 1, the marginal effect in the subpopulation is less than the corresponding coefficient . If the interest is in the linear relationship between y ∗ and x (population), the β can be directly interpreted Laura Magazzini (@univr.it) Truncation and Censoring 7 / 35 Truncation and censoring Truncation Estimation OLS of y on x leads to inconsistent estimates ∗ ∗ 0 . The model is yi jyi > a = E(yi jyi > a) + i = xi β + σλ(αi ) + i . By construction, the error term is heteroskedastic . Omitted variable bias (λi is not included in the regression) . In applications, it is usually found that the OLS estimates are biased toward zero Under the normality assumption, MLE can be obtained 1 y−µ σ φ( σ ) a−µ . f (yjy > a) = 1−Φ(α) with α = σ . The log-likelihood can be written as N 0 N 0 X yi − x β X a − x β log L = log σ−1φ i − log 1 − Φ i σ σ i=1 i=1 Laura Magazzini (@univr.it) Truncation and Censoring 8 / 35 Truncation and censoring Truncation Example: simulated data Y ∗ = −1:5 + 0:5x + /2, N = 100, a = 0 Laura Magazzini (@univr.it) Truncation and Censoring 9 / 35 Truncation and censoring Censored data Censored data Censored regression models generally apply when the variable to be explained is partly continuous but has positive probability mass at one or more points Assume there is a variable with quantitave meaning y ∗ and we are interested in E[y ∗jx] If y ∗ and x were observed for everyone in the population: standard regression methods (ordinary or nonlinear least squares) can be applied In the case of censored data, y ∗ is not observable for part of the population . Conventional regression methods fail to account for the qualitative difference between limit (censored) and nonlimit (continuous) observations . Top coding / corner solution outcome Laura Magazzini (@univr.it) Truncation and Censoring 10 / 35 Truncation and censoring Censored data Top coding: example Data generating process Let wealth∗ denote actual family wealth, measured in thousands of dollars Suppose that wealth∗ follows the linear regression model E[wealth∗jx] = x0β Censored data: we observe wealth only when wealth∗ > 200 . When wealth∗ is smaller than 200 we know that it is, but we do not know the actual value of wealth Therefore observed wealth can be written as wealth = max(wealth∗; 200) Laura Magazzini (@univr.it) Truncation and Censoring 11 / 35 Truncation and censoring Censored data Top coding: example Estimation of β We assume that wealth∗ given x has a homoskedastic normal distribution wealth∗ = x0β + , jx ∼ N(0; σ2) Recorded wealth is: wealth = max(wealth∗; 200) = max(x0β + , 200) β is estimated via maximum likelihood using a mixture of discrete and continuous distributions (details later...) Laura Magazzini (@univr.it) Truncation and Censoring 12 / 35 Truncation and censoring Censored data Example: seat demanded and ticket sold Laura Magazzini (@univr.it) Truncation and Censoring 13 / 35 Truncation and censoring Censored data Corner solution outcomes Still labeled \censored regression models" Pioneer work by Tobin (1958): household purchase of durable goods Let y be an observable choice or outcome describing some economic agent, such as an individual or a firm, with the following characteristics: y takes on the value zero with positive probability but is a continuous random variable over strictly positive values . Examples: amount of life insurance coverage chosen by an individual, family contributions to an individual retirement account, and firm expenditures on research and development . We can imagine economic agents solving an optimization problem, and for some agents the optimal choice will be the corner solution, y = 0 . The issue here is not data observability, rather individual behaviour . We are interested in features of the distribution of y given x, such as E[yjx] and Pr(y = 0jx) Laura Magazzini (@univr.it) Truncation and Censoring 14 / 35 Truncation and censoring Censored data The censored normal distribution y ∗ ∼ N(µ, σ2) Observed data are censored in a = 0: y = 0 if y ∗ ≤ 0 y = y ∗ if y ∗ > 0 The distribution is a mixture of discrete and continuous distribution . If y ∗ ≤ 0: f (y) = Pr(y = 0) = Pr(y ∗ ≤ 0) = Φ(−µ/σ) = 1 − Φ(µ/σ) ∗ y−µ . If y > 0: f (y) = φ σ 0−µ E[y] = 0 × Pr(y = 0) + E[yjy > 0] × Pr(y > 0) = (µ + σλ)Φ σ with λ = φ/Φ Laura Magazzini (@univr.it) Truncation and Censoring 15 / 35 Truncation and censoring Censored data The censored regression model Tobit model (Tobin, 1958) ∗ ∗ 0 Let y be a continuous variable (latent variable): yi = xi β + i , where jx ∼ N(0; σ2) The observed data y are ∗ ∗ 0 if yi ≤ 0 yi = max(0; yi ) = ∗ ∗ yi if yi > 0 Why not OLS? Estimates can be obtained by MLE Laura Magazzini (@univr.it) Truncation and Censoring 16 / 35 Truncation and censoring Censored data Estimation A positive probability is assigned to the observations yi = 0: ∗ Pr(yi = 0jxi ) = Pr(yi ≤ 0jxi ) 0 = Pr(xi β + i ≤ 0) 0 = Pr(i ≤ −xi β) 0 = 1 − Pr(i < xi β) x0β = 1 − Φ i σ The likelihood can be written as: 0 0 Y x β Y 1 yi − x β L(β; σ2jy) = 1 − Φ i φ i σ σ σ yi =0 yi >0 2 y −x0β 0 − 1 i i Y x β Y 1 2 σ = 1 − Φ i p e σ 2πσ2 yi =0 yi >0 Laura Magazzini (@univr.it) Truncation and Censoring 17 / 35 Truncation and censoring Censored data Marginal effect in the tobit model In the case of censored data, β estimated from the tobit model can be employed to study the effect of x on E[y ∗jx] In the case of corner solution outcome, the estimated β are not sufficient since E[yjx] and E[yjx; y > 0] depend on β in a non-linear way @E[y jx ] x0β i i = Φ i β @xi σ @E[yi jxi ] @E[yi jxi ;yi >0] @ Pr[yi >0] = Pr(yi > 0) + E[yi jxi ; yi > 0] @xi @xi @xi A change in xi has two effects: ∗ (1) It affects the conditional mean of yi in the positive part of the distribution (2) It affects the probability that the observation will fall in the positive part of the distribution Laura Magazzini (@univr.it) Truncation and Censoring 18 / 35 Truncation and censoring Censored data Example: simulated data Y ∗ = −1:5 + 0:5x + /2, N = 100 Laura Magazzini (@univr.it) Truncation and Censoring 19 / 35 Truncation and censoring Censored data Some issues in specification Heteroschedasticity .