Consistency of Heckman-Type Two-Step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann

Consistency of Heckman-type two-step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann To cite this version: Harald Tauchmann. Consistency of Heckman-type two-step Estimators for the Multivariate Sample- Selection Model. Applied Economics, Taylor & Francis (Routledge), 2009, 42 (30), pp.3895. 10.1080/00036840802360179. hal-00582191 HAL Id: hal-00582191 https://hal.archives-ouvertes.fr/hal-00582191 Submitted on 1 Apr 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Submitted Manuscript For Peer Review Consistency of Heckman-type two-step Estimators for the Multivariate Sample-Selection Model Journal: Applied Economics Manuscript ID: APE-06-0698 Journal Selection: Applied Economics C15 - Statistical Simulation Methods|Monte Carlo Methods < C1 - Econometric and Statistical Methods: General < C - Mathematical and Quantitative Methods, C34 - Truncated and Censored Models < JEL Code: C3 - Econometric Methods: Multiple/Simultaneous Equation Models < C - Mathematical and Quantitative Methods, C51 - Model Construction and Estimation < C5 - Econometric Modeling < C - Mathematical and Quantitative Methods multivariate sample-selection model, censored system of equations, Keywords: Heckman-correction Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 1 of 56 Submitted Manuscript 1 2 3 4 5 Consistency of Heckman-type two-step 6 7 Estimators for the Multivariate 8 9 10 Sample-Selection Model 11 12 13 14 15 16 17 18 For Peer Review 19 20 November 2006 21 22 23 24 25 26 27 28 29 30 Abstract: This analysis shows that multivariate generalizations to the classical Heck- 31 32 man (1976 and 1979) two-step estimator that account for cross-equation correlation 33 and use the inverse Mills ratio as correction-term are consistent only if certain re- 34 35 strictions apply to the true error-covariance structure. An alternative class of general- 36 izations to the classical Heckman two-step approach is derived that condition on the 37 38 entire selection pattern rather than selection in particular equations and, therefore, use 39 40 modified correction-terms. It is shown that this class of estimators is consistent. In 41 addition, Monte-Carlo results illustrate that these estimators display a smaller mean 42 43 square prediction error. 44 45 46 47 48 JEL Classification Number: C15, C34, C51. 49 50 51 Key words: multivariate sample-selection model, censored system of equations, 52 Heckman-correction. 53 54 55 56 57 58 59 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Submitted Manuscript Page 2 of 56 1 2 3 4 5 1 Introduction 6 7 8 Using non-aggregated micro-data for estimating systems of seemingly unrelated equa- 9 10 11 tions – the most prominent among them being demand systems – often encounters 12 13 the problem of numerous zero-observations in the dependent variables. These can- 14 15 1 16 not be appropriately explained by conventional continuous SUR models. Instead, 17 18 zero-observationsFor may be modelled Peer as determined Review by an upstream multivariate binary 19 20 21 choice problem. Under the assumption of normally distributed errors, the resulting 22 23 joint model represents a multivariate generalization to the classical univariate sample- 24 25 26 selection model, cf. Heckman (1976 and 1979). In the literature, this model is often 27 28 referred to as a “censored system of equations”, yet censoring in the narrow sense just 29 30 2 31 represents a special case of the general model. 32 33 The question of how to estimate the parameters of this model is subject to an 34 35 36 ongoing debate. Clearly, under parametric distributional assumptions full information 37 38 maximum likelihood (FIML) is the efficient estimation technique. In fact, FIML has 39 40 41 recently been applied to this problem by Yen (2005). However, the FIML estimator is 42 43 computationally extremely demanding, rendering much simpler two-step approaches 44 45 46 worth considering for many applications. 47 48 Among two-step estimators the one proposed by Heien & Wessels (1990) has been 49 50 51 particularly popular. Besides numerous other authors, it has been applied by Heien 52 53 1See Zellner (1963) for the seemingly unrelated regression equations (SUR) model. 54 55 2We stick to the relevant literature und use the term “censored” as a synonym for “not selected”. 56 57 58 59 2 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 3 of 56 Submitted Manuscript 1 2 3 4 & Durham (1991), Gao et al. (1995), and Nayga et al. (1999). However, Shonkwiler 5 6 7 & Yen (1999) as well as Vermeulen (2001) show that this estimator lacks a decent 8 9 basis in statistical theory and cannot be interpreted in terms of conditional means. 10 11 12 The Heien & Wessels (1990) estimator, therefore, is inconsistent despite its popularity. 13 14 Chen & Yen (2005) further investigate the nature of its inconsistency and show that 15 16 17 even a modified variant of this estimator fails to correct properly for sample-selection 18 For Peer Review 19 bias. Shonkwiler & Yen (1999) propose an alternative simple two-step estimator that 20 21 22 – in contrast to Heien & Wessels (1990) – is theoretically well founded. This estimator 23 24 is based on the mean of dependent variables that is unconditional on the outcome of 25 26 27 the upstream discrete choice model. Su & Yen (2000), Yen et al. (2002) and Goodwin 28 29 et al. (2004) may serve as examples for applications of this procedure. 30 31 32 Tauchmann (2005) compares the performance of the Shonkwiler & Yen (1999) esti- 33 34 mator and two-step estimators that – analogously to the classical Heckman (1976 and 35 36 37 1979) two-step approach, yet in contrast to Shonkwiler & Yen (1999) – condition on 38 39 the outcome the upstream discrete choice model. In terms of the mean square predic- 40 41 42 tion error, the unconditional Shonkwiler & Yen (1999) estimator is shown to perform 43 44 poorly if the conditional mean of the dependent variables is large compared to its 45 46 47 conditional variance. Tauchmann (2005), however, exclusively focuses on the mean 48 49 square error yet does not check for unbiasedness and consistency of the conditional 50 51 52 estimators. Though one may argue that it is of no relevance in applied work whether 53 54 an error originates from an estimator’s bias or from its variance, many researches do 55 56 57 58 59 3 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Submitted Manuscript Page 4 of 56 1 2 3 4 avoid inconsistent estimators, even if their mean square error is small. For this rea- 5 6 7 son, addressing unbiasedness and consistency of Heckman-type two-step estimators for 8 9 censored systems of equations is a relevant task. 10 11 12 The analysis presented in this article shows that some of the estimators proposed 13 14 by Tauchmann (2005) are consistent only for restrictive error-covariance structures. It 15 16 17 also shows that a modified two-step Heckman-type estimator is generally consistent 18 For Peer Review 19 and performs well in terms of the mean square prediction error. In order to yield 20 21 22 these results, the remainder of this paper is organized as follows: Section 2 introduces 23 24 the model to be analyzed in more detail and analyzes the properties of straightfor- 25 26 27 ward multivariate generalizations to the Heckman (1976 and 1979) two-step estimator. 28 29 In Section 3 an alternative class of generalized two-step Heckman-type estimators is 30 31 32 derived. Section 4 presents results from Monte-Carlo simulations that illustrate the 33 34 theoretical results and extends the analysis to the estimators’ mean square error. Sec- 35 36 37 tion 5 concludes. 38 39 40 41 42 2 An analysis of sample-selection models 43 44 45 46 47 2.1 A multivariate sample-selection model 48 49 50 Recall the m-variate sample-selection model, which is analyzed by Heinen & Wessels 51 52 (1990), Shonkwiler & Yen (1999), Tauchmann (2005), Yen (2005), and Chen & Yen 53 54 55 56 57 58 59 4 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 5 of 56 Submitted Manuscript 1 2 3 4 (2005). The equations 5 6 7 ∗ 0 8 yit = xitβi + εit (1) 9 10 d∗ = z0 α + υ , (2) 11 it it i it 12 13 ∗ ∗ 14 characterize the latent model, that is yit and dit are unobserved. Their observed 15 16 counterparts y and d are determined by 17 it it 18 For Peer Review 19 ∗ 1 if dit > 0 20 d = (3) 21 it ∗ 22 0 if dit ≤ 0 23 24 ∗ 25 yit = dityit. (4) 26 27 28 Here, i = 1, . , m indexes the m equations of the system, and t = 1,...,T in- 29 30 31 dexes the individuals. xit and zit are vectors of observed exogenous variables. The 32 0 33 vector dt = [d1t . dmt] describes the entire individual selection pattern. Finally, 34 35 0 0 36 εt = [ε1t . εmt] and υt = [υ1t . υmt] are normally distributed, zero-mean error vec- 37 38 tors with the covariance matrix 39 40 Σ Σ0 41 εε ευ 42 Var (εt, υt) = .

Consistency of Heckman-Type Two-Step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann

The Use of Propensity Score Matching in the Evaluation of Active Labour

Learning from Incomplete Data

Choosing the Best Model in the Presence of Zero Trade: a Fish Product Analysis

Quasi-Experimental Evaluation of Alternative Sample Selection Corrections∗

A General Instrumental Variable Framework for Regression Analysis with Outcome Missing Not at Random

Misaccounting for Endogeneity: the Peril of Relying on the Heckman Two-Step Method Without a Valid Instrument

Lecture 1: Introduction to Regression

Heckman Correction Technique - a Short Introduction Bogdan Oancea INS Romania Problem Definition

Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes Theodore Eisenberg Cornell Law School (Deceased)

A Heckman Selection-T Model

Inference for Non-Random Samples

New Approach to Estimating Gravity Models with Heteroscedasticity and Zero Trade Values