Truncation and Censoring
Total Page:16
File Type:pdf, Size:1020Kb
Truncation and Censoring Laura Magazzini [email protected] Laura Magazzini (@univr.it) Truncation and Censoring 1 / 40 Truncation and censoring Truncation and censoring Truncation: sample data are drawn from a subset of a larger population of interest . Characteristic of the distribution from which the sample data are drawn . Example: studies of income based on incomes above or below the poverty line (of limited usefulness for inference about the whole population) Censoring: values of the dependent variable in a certain range are all transformed to (or reported at) a single value . Defect in the sample data . Example: in studies of income, people below the poverty line are reported at the poverty line Truncation and censoring introduce similar distortion into conventional statistical results Laura Magazzini (@univr.it) Truncation and Censoring 2 / 40 Truncation and censoring Truncation Truncation Aim: infer the caracteristics of a full population from a sample drawn from a restricted population . Example: characteristics of people with income above $100,000 Let Y be a continous random variable with pdf f (y). The conditional distribution of y given y > a (a a constant) is: f (y) f (yjy > a) = Pr(y > a) In case of y normally distributed: 1 φ x−µ f (yjy > a) = σ σ 1 − Φ(α) a−µ where α = σ Laura Magazzini (@univr.it) Truncation and Censoring 3 / 40 Truncation and censoring Truncation Moments of truncated distributions E(Y jy < a) < E(Y ) E(Y jy > a) > E(Y ) V (Y jtrunc:) < V (Y ) Laura Magazzini (@univr.it) Truncation and Censoring 4 / 40 Truncation and censoring Truncation Moments of the truncated normal distribution Let y ∼ N(µ, σ2) and a constant E(yjtruncation) = µ + σλ(α) Var(yjtruncation) = σ2[1 − δ(α)] .α = (a − µ)/σ .φ (α) is the standard normal density .λ (α) is called inverse Mills ratio: λ(α) = φ(α)=[1 − Φ(α)] if truncation is y > a λ(α) = −φ(α)=Φ(α) if truncation is y < a .δ (α) = λ(α)[λ(α) − α], where 0 < δ(α) < 1 for any α Laura Magazzini (@univr.it) Truncation and Censoring 5 / 40 Truncation and censoring Truncation Example: a truncated log-normal income distribution From New York Post (1987): \The typical upper affluent American... makes $142,000 per year... The people surveyed had household income of at least $100,000" . Does this tell us anything about the typical American? \... only 2 percent of Americans make the grade" . Degree of truncation in the sample: 98% . The $142,000 is probably quite far from the mean in the full population Assuming lognormally distributed income in the population (log of income has a normal distribution), the information can be employed to deduce the population mean income Let x = income and y = ln x σφ(α) E[yjy > log 100] = µ + 1 − Φ(α) By substituting E[x] = E[ey ] = eµ+σ2=2, we get E[x] = $22; 087 . 1987 Statistical Abstract of the US listed average household income of about $25; 000 (relatively good estimate based on little information!) Laura Magazzini (@univr.it) Truncation and Censoring 6 / 40 Truncation and censoring Truncation The truncated regression model ∗ 0 2 yi = xi β + i , with i jxi ∼ N(0; σ ) ∗ Unit i is observed only if yi cross a threshold: ∗ n:a: if yi ≤ a yi = ∗ ∗ yi if yi > a ∗ 0 0 E[yi jyi > a] = xi β + σλ(αi ), with αi = (a − xi β)/σ Laura Magazzini (@univr.it) Truncation and Censoring 7 / 40 Truncation and censoring Truncation OLS estimation OLS of y on x leads to inconsistent estimates ∗ ∗ 0 . The model is yi jyi > a = E(yi jyi > a) + i = xi β + σλ(αi ) + i . By construction, the error term is heteroskedastic . Omitted variable bias (λi is not included in the regression) . In applications, it is usually found that the OLS estimates are biased toward zero: the marginal effect in the subpopulation is: ∗ @E[yi jyi > a] @αi = β + σ(dλ(αi )=dαi ) @xi @xi = ::: = β(1 − δ(αi )) { Since 0 < δ(αi ) < 1, the marginal effect in the subpopulation is less than the corresponding coefficient Laura Magazzini (@univr.it) Truncation and Censoring 8 / 40 Truncation and censoring Truncation Maximum likelihood estimation Under the normality assumption, MLE can be obtained that provides a consistent estimator . For each observation: y −x0β 1 φ i i ∗ σ σ f (yi jyi > a) = 1 − Φ(αi ) 0 a−xi β with αi = σ . The log-likelihood can be written as N 0 N 0 X yi − x β X a − x β log L = log σ−1φ i − log 1 − Φ i σ σ i=1 i=1 Laura Magazzini (@univr.it) Truncation and Censoring 9 / 40 Truncation and censoring Truncation Example: simulated data If y ∗ is fully observed, OLS can be applied Laura Magazzini (@univr.it) Truncation and Censoring 10 / 40 Truncation and censoring Truncation Example: simulated data However, only y ∗ > a is included in the sample Laura Magazzini (@univr.it) Truncation and Censoring 11 / 40 Truncation and censoring Truncation Example: simulated data OLS on the observed sample is biased Laura Magazzini (@univr.it) Truncation and Censoring 12 / 40 Truncation and censoring Truncation Example: simulated data MLE (truncreg) allows to get a consistent estimate of β Laura Magazzini (@univr.it) Truncation and Censoring 13 / 40 Truncation and censoring Censored data Censored data Censored regression models generally apply when the variable to be explained is partly continuous but has positive probability mass at one or more points Assume there is a variable with quantitative meaning y ∗ and we are interested in E[y ∗jx] If y ∗ and x were observed for everyone in the population: standard regression methods (ordinary or nonlinear least squares) can be applied In the case of censored data, y ∗ is not observable for part of the population . Conventional regression methods fail to account for the qualitative difference between limit (censored) and nonlimit (continuous) observations . Top coding / corner solution outcome Laura Magazzini (@univr.it) Truncation and Censoring 14 / 40 Truncation and censoring Censored data Top coding: example Data generating process Let wealth∗ denote actual family wealth, measured in thousands of dollars Suppose that wealth∗ follows the linear regression model E[wealth∗jx] = x0β Censored data: we observe wealth only when wealth∗ > 200 . When wealth∗ is smaller than 200 we know that it is, but we do not know the actual value of wealth Therefore observed wealth can be written as wealth = max(wealth∗; 200) Laura Magazzini (@univr.it) Truncation and Censoring 15 / 40 Truncation and censoring Censored data Top coding: example Estimation of β We assume that wealth∗ given x has a homoskedastic normal distribution wealth∗ = x0β + , jx ∼ N(0; σ2) Recorded wealth is: wealth = max(wealth∗; 200) = max(x0β + , 200) β is estimated via maximum likelihood using a mixture of discrete and continuous distributions (details later...) Laura Magazzini (@univr.it) Truncation and Censoring 16 / 40 Truncation and censoring Censored data Example: seat demanded and ticket sold Laura Magazzini (@univr.it) Truncation and Censoring 17 / 40 Truncation and censoring Censored data The censored normal distribution y ∗ ∼ N(µ, σ2) Observed data are censored in a = 0: y = 0 if y ∗ ≤ 0 y = y ∗ if y ∗ > 0 The distribution is a mixture of discrete and continuous distribution . If y ∗ ≤ 0: f (y) = Pr(y = 0) = Pr(y ∗ ≤ 0) = Φ(−µ/σ) = 1 − Φ(µ/σ) ∗ y−µ . If y > 0: f (y) = φ σ 0−µ E[y] = 0 × Pr(y = 0) + E[yjy > 0] × Pr(y > 0) = (µ + σλ)Φ σ with λ = φ/Φ Laura Magazzini (@univr.it) Truncation and Censoring 18 / 40 Truncation and censoring Censored data The censored regression model Tobit model (Tobin, 1958) Let y ∗ be a continuous variable (latent variable): ∗ 0 yi = xi β + i ; where jx ∼ N(0; σ2) The observed data y are ∗ ∗ 0 if yi ≤ 0 yi = max(0; yi ) = ∗ ∗ yi if yi > 0 Why not OLS? Why not OLS on positive y ∗? Laura Magazzini (@univr.it) Truncation and Censoring 19 / 40 Truncation and censoring Censored data MLE estimation As we assume jx ∼ N(0; σ2), the likelihood function can be written The distribution is a mixture of discrete and continuous distribution . A positive probability is assigned to the observations yi = 0: ∗ Pr(yi = 0jxi ) = Pr(yi ≤ 0jxi ) 0 = Pr(xi β + i ≤ 0) 0 = Pr(i ≤ −xi β) 0 = 1 − Pr(i < xi β) x 0β = 1 − Φ i σ 0 ∗ yi −xi β . For yi > 0: f (yi ) = φ σ Laura Magazzini (@univr.it) Truncation and Censoring 20 / 40 Truncation and censoring Censored data MLE estimation The likelihood can be written as: 0 0 Y x β Y 1 yi − x β L(β; σ2jy) = 1 − Φ i φ i σ σ σ yi =0 yi >0 2 y −x0β 0 − 1 i i Y x β Y 1 2 σ = 1 − Φ i p e σ 2πσ2 yi =0 yi >0 In the case of censored data, β estimated from the Tobit model can be employed to study the effect of x on E[y ∗jx] Laura Magazzini (@univr.it) Truncation and Censoring 21 / 40 Truncation and censoring Censored data Example: simulated data If y ∗ is fully observed, OLS can be applied Laura Magazzini (@univr.it) Truncation and Censoring 22 / 40 Truncation and censoring Censored data Example: simulated data However, if y ∗ ≤ a, data are recorded as a Laura Magazzini (@univr.it) Truncation and Censoring 23 / 40 Truncation and censoring Censored data Example: simulated data OLS on the observed sample is biased Laura Magazzini (@univr.it) Truncation and Censoring 24 / 40 Truncation and censoring Censored data Example: simulated data MLE (tobit) allows to get a consistent estimate of β Laura Magazzini (@univr.it) Truncation and Censoring 25 / 40 Truncation and censoring Censored data Corner solution outcomes Still labeled \censored regression models" Pioneer work by Tobin (1958): household purchase of durable goods Let y be an observable choice or outcome describing some economic agent, such as an individual or a firm, with the following characteristics: y takes on the value zero with positive probability but is a continuous random variable over strictly positive values .