Consistency of Heckman-Type Two-Step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann

Total Page:16

File Type:pdf, Size:1020Kb

Consistency of Heckman-Type Two-Step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann Consistency of Heckman-type two-step Estimators for the Multivariate Sample-Selection Model Harald Tauchmann To cite this version: Harald Tauchmann. Consistency of Heckman-type two-step Estimators for the Multivariate Sample- Selection Model. Applied Economics, Taylor & Francis (Routledge), 2009, 42 (30), pp.3895. 10.1080/00036840802360179. hal-00582191 HAL Id: hal-00582191 https://hal.archives-ouvertes.fr/hal-00582191 Submitted on 1 Apr 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Submitted Manuscript For Peer Review Consistency of Heckman-type two-step Estimators for the Multivariate Sample-Selection Model Journal: Applied Economics Manuscript ID: APE-06-0698 Journal Selection: Applied Economics C15 - Statistical Simulation Methods|Monte Carlo Methods < C1 - Econometric and Statistical Methods: General < C - Mathematical and Quantitative Methods, C34 - Truncated and Censored Models < JEL Code: C3 - Econometric Methods: Multiple/Simultaneous Equation Models < C - Mathematical and Quantitative Methods, C51 - Model Construction and Estimation < C5 - Econometric Modeling < C - Mathematical and Quantitative Methods multivariate sample-selection model, censored system of equations, Keywords: Heckman-correction Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 1 of 56 Submitted Manuscript 1 2 3 4 5 Consistency of Heckman-type two-step 6 7 Estimators for the Multivariate 8 9 10 Sample-Selection Model 11 12 13 14 15 16 17 18 For Peer Review 19 20 November 2006 21 22 23 24 25 26 27 28 29 30 Abstract: This analysis shows that multivariate generalizations to the classical Heck- 31 32 man (1976 and 1979) two-step estimator that account for cross-equation correlation 33 and use the inverse Mills ratio as correction-term are consistent only if certain re- 34 35 strictions apply to the true error-covariance structure. An alternative class of general- 36 izations to the classical Heckman two-step approach is derived that condition on the 37 38 entire selection pattern rather than selection in particular equations and, therefore, use 39 40 modified correction-terms. It is shown that this class of estimators is consistent. In 41 addition, Monte-Carlo results illustrate that these estimators display a smaller mean 42 43 square prediction error. 44 45 46 47 48 JEL Classification Number: C15, C34, C51. 49 50 51 Key words: multivariate sample-selection model, censored system of equations, 52 Heckman-correction. 53 54 55 56 57 58 59 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Submitted Manuscript Page 2 of 56 1 2 3 4 5 1 Introduction 6 7 8 Using non-aggregated micro-data for estimating systems of seemingly unrelated equa- 9 10 11 tions – the most prominent among them being demand systems – often encounters 12 13 the problem of numerous zero-observations in the dependent variables. These can- 14 15 1 16 not be appropriately explained by conventional continuous SUR models. Instead, 17 18 zero-observationsFor may be modelled Peer as determined Review by an upstream multivariate binary 19 20 21 choice problem. Under the assumption of normally distributed errors, the resulting 22 23 joint model represents a multivariate generalization to the classical univariate sample- 24 25 26 selection model, cf. Heckman (1976 and 1979). In the literature, this model is often 27 28 referred to as a “censored system of equations”, yet censoring in the narrow sense just 29 30 2 31 represents a special case of the general model. 32 33 The question of how to estimate the parameters of this model is subject to an 34 35 36 ongoing debate. Clearly, under parametric distributional assumptions full information 37 38 maximum likelihood (FIML) is the efficient estimation technique. In fact, FIML has 39 40 41 recently been applied to this problem by Yen (2005). However, the FIML estimator is 42 43 computationally extremely demanding, rendering much simpler two-step approaches 44 45 46 worth considering for many applications. 47 48 Among two-step estimators the one proposed by Heien & Wessels (1990) has been 49 50 51 particularly popular. Besides numerous other authors, it has been applied by Heien 52 53 1See Zellner (1963) for the seemingly unrelated regression equations (SUR) model. 54 55 2We stick to the relevant literature und use the term “censored” as a synonym for “not selected”. 56 57 58 59 2 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 3 of 56 Submitted Manuscript 1 2 3 4 & Durham (1991), Gao et al. (1995), and Nayga et al. (1999). However, Shonkwiler 5 6 7 & Yen (1999) as well as Vermeulen (2001) show that this estimator lacks a decent 8 9 basis in statistical theory and cannot be interpreted in terms of conditional means. 10 11 12 The Heien & Wessels (1990) estimator, therefore, is inconsistent despite its popularity. 13 14 Chen & Yen (2005) further investigate the nature of its inconsistency and show that 15 16 17 even a modified variant of this estimator fails to correct properly for sample-selection 18 For Peer Review 19 bias. Shonkwiler & Yen (1999) propose an alternative simple two-step estimator that 20 21 22 – in contrast to Heien & Wessels (1990) – is theoretically well founded. This estimator 23 24 is based on the mean of dependent variables that is unconditional on the outcome of 25 26 27 the upstream discrete choice model. Su & Yen (2000), Yen et al. (2002) and Goodwin 28 29 et al. (2004) may serve as examples for applications of this procedure. 30 31 32 Tauchmann (2005) compares the performance of the Shonkwiler & Yen (1999) esti- 33 34 mator and two-step estimators that – analogously to the classical Heckman (1976 and 35 36 37 1979) two-step approach, yet in contrast to Shonkwiler & Yen (1999) – condition on 38 39 the outcome the upstream discrete choice model. In terms of the mean square predic- 40 41 42 tion error, the unconditional Shonkwiler & Yen (1999) estimator is shown to perform 43 44 poorly if the conditional mean of the dependent variables is large compared to its 45 46 47 conditional variance. Tauchmann (2005), however, exclusively focuses on the mean 48 49 square error yet does not check for unbiasedness and consistency of the conditional 50 51 52 estimators. Though one may argue that it is of no relevance in applied work whether 53 54 an error originates from an estimator’s bias or from its variance, many researches do 55 56 57 58 59 3 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Submitted Manuscript Page 4 of 56 1 2 3 4 avoid inconsistent estimators, even if their mean square error is small. For this rea- 5 6 7 son, addressing unbiasedness and consistency of Heckman-type two-step estimators for 8 9 censored systems of equations is a relevant task. 10 11 12 The analysis presented in this article shows that some of the estimators proposed 13 14 by Tauchmann (2005) are consistent only for restrictive error-covariance structures. It 15 16 17 also shows that a modified two-step Heckman-type estimator is generally consistent 18 For Peer Review 19 and performs well in terms of the mean square prediction error. In order to yield 20 21 22 these results, the remainder of this paper is organized as follows: Section 2 introduces 23 24 the model to be analyzed in more detail and analyzes the properties of straightfor- 25 26 27 ward multivariate generalizations to the Heckman (1976 and 1979) two-step estimator. 28 29 In Section 3 an alternative class of generalized two-step Heckman-type estimators is 30 31 32 derived. Section 4 presents results from Monte-Carlo simulations that illustrate the 33 34 theoretical results and extends the analysis to the estimators’ mean square error. Sec- 35 36 37 tion 5 concludes. 38 39 40 41 42 2 An analysis of sample-selection models 43 44 45 46 47 2.1 A multivariate sample-selection model 48 49 50 Recall the m-variate sample-selection model, which is analyzed by Heinen & Wessels 51 52 (1990), Shonkwiler & Yen (1999), Tauchmann (2005), Yen (2005), and Chen & Yen 53 54 55 56 57 58 59 4 60 Editorial Office, Dept of Economics, Warwick University, Coventry CV4 7AL, UK Page 5 of 56 Submitted Manuscript 1 2 3 4 (2005). The equations 5 6 7 ∗ 0 8 yit = xitβi + εit (1) 9 10 d∗ = z0 α + υ , (2) 11 it it i it 12 13 ∗ ∗ 14 characterize the latent model, that is yit and dit are unobserved. Their observed 15 16 counterparts y and d are determined by 17 it it 18 For Peer Review 19 ∗ 1 if dit > 0 20 d = (3) 21 it ∗ 22 0 if dit ≤ 0 23 24 ∗ 25 yit = dityit. (4) 26 27 28 Here, i = 1, . , m indexes the m equations of the system, and t = 1,...,T in- 29 30 31 dexes the individuals. xit and zit are vectors of observed exogenous variables. The 32 0 33 vector dt = [d1t . dmt] describes the entire individual selection pattern. Finally, 34 35 0 0 36 εt = [ε1t . εmt] and υt = [υ1t . υmt] are normally distributed, zero-mean error vec- 37 38 tors with the covariance matrix 39 40 Σ Σ0 41 εε ευ 42 Var (εt, υt) = .
Recommended publications
  • The Use of Propensity Score Matching in the Evaluation of Active Labour
    Working Paper Number 4 THE USE OF PROPENSITY SCORE MATCHING IN THE EVALUATION OF ACTIVE LABOUR MARKET POLICIES THE USE OF PROPENSITY SCORE MATCHING IN THE EVALUATION OF ACTIVE LABOUR MARKET POLICIES A study carried out on behalf of the Department for Work and Pensions By Alex Bryson, Richard Dorsett and Susan Purdon Policy Studies Institute and National Centre for Social Research Ó Crown copyright 2002. Published with permission of the Department Work and Pensions on behalf of the Controller of Her Majesty’s Stationary Office. The text in this report (excluding the Royal Arms and Departmental logos) may be reproduced free of charge in any format or medium provided that it is reproduced accurately and not used in a misleading context. The material must be acknowledged as Crown copyright and the title of the report specified. The DWP would appreciate receiving copies of any publication that includes material taken from this report. Any queries relating to the content of this report and copies of publications that include material from this report should be sent to: Paul Noakes, Social Research Branch, Room 4-26 Adelphi, 1-11 John Adam Street, London WC2N 6HT For information about Crown copyright you should visit the Her Majesty’s Stationery Office (HMSO) website at: www.hmsogov.uk First Published 2002 ISBN 1 84388 043 1 ISSN 1476 3583 Acknowledgements We would like to thank Liz Rayner, Mike Daly and other analysts in the Department for Work and Pensions for their advice, assistance and support in writing this paper. We are also particularly grateful to Professor Jeffrey Smith at the University of Maryland for extensive and insightful comments on previous versions of this paper.
    [Show full text]
  • Learning from Incomplete Data
    Learning from Incomplete Data Sameer Agarwal [email protected] Department of Computer Science and Engineering University of California, San Diego Abstract Survey non-response is an important problem in statistics, eco- nomics and social sciences. The paper reviews the missing data framework of Little & Rubin [Little and Rubin, 1986]. It presents a survey of techniques to deal with non-response in surveys using a likelihood based approach. The focuses on the case where the probability of a data missing depends on its value. The paper uses the two-step model introduced by Heckman to illustrate and an- alyze three popular techniques; maximum likelihood, expectation maximization and Heckman correction. We formulate solutions to heckman's model and discuss their advantages and disadvantages. The paper also compares and contrasts Heckman's procedure with the EM algorithm, pointing out why Heckman correction is not an EM algorithm. 1 Introduction Surveys are a common method of data collection in economics and social sciences. Often surveys suffer from the problem of nonresponse. The reasons for nonresponse can be varied, ranging from the respondent not being present at the time of survey to a certain class of respondents not being part of the survey at all. Some of these factors affect the quality of the data. For example ignoring a certain portion of the populace might result in the data being biased. Care must be taken when dealing with such data as naive application of standard statistical methods of inference will produce biased or in some cases wrong conclusions. An often quoted example is the estimation of married women's labor supply from survey data collected from working women [Heckman, 1976].
    [Show full text]
  • Choosing the Best Model in the Presence of Zero Trade: a Fish Product Analysis
    Working Paper: 2012-50 Choosing the best model in the presence of zero trade: a fish product analysis N. Tran, N. Wilson, D. Hite CHOOSING THE BEST MODEL IN THE PRESENCE OF ZERO TRADE: A FISH PRODUCT ANALYSIS Nhuong Tran Post-Doctoral Fellow The WorldFish Center PO Box 500 GPO Penang Malaysia [email protected] 604.6202.133 Norbert Wilson* Associate Professor Auburn University Department of Agricultural Economics & Rural Sociology 100 C Comer Hall Auburn, AL 36849 [email protected] 334.844.5616 and Diane Hite Professor Auburn University Department of Agricultural Economics & Rural Sociology 100 C Comer Hall Auburn, AL 36849 [email protected] 334.844.5655 * Corresponding author Abstract The purpose of the paper is to test the hypothesis that food safety (chemical) standards act as barriers to international seafood imports. We use zero-accounting gravity models to test the hypothesis that food safety (chemical) standards act as barriers to international seafood imports. The chemical standards on which we focus include chloramphenicol required performance limit, oxytetracycline maximum residue limit, fluoro-quinolones maximum residue limit, and dichlorodiphenyltrichloroethane (DDT) pesticide residue limit. The study focuses on the three most important seafood markets: the European Union’s 15 members, Japan, and North America. Our empirical results confirm the hypothesis and are robust to the OLS as well as alternative zero- accounting gravity models such as the Heckman estimation and the Poisson family regressions. For the choice of the best model specification to account for zero trade and heteroskedastic issues, it is inconclusive to base on formal statistical tests; however the Heckman sample selection and zero- inflated negative binomial (ZINB) models provide the most reliable parameter estimates based on the statistical tests, magnitude of coefficients, economic implications, and the literature findings.
    [Show full text]
  • Quasi-Experimental Evaluation of Alternative Sample Selection Corrections∗
    Quasi-Experimental Evaluation of Alternative Sample Selection Corrections∗ Robert Garlicky and Joshua Hymanz February 3, 2021 Abstract Researchers routinely use datasets where outcomes of interest are unobserved for some cases, potentially creating a sample selection problem. Statisticians and econometricians have proposed many selection correction methods to address this challenge. We use a nat- ural experiment to evaluate different sample selection correction methods' performance. From 2007, the state of Michigan required that all students take a college entrance exam, increasing the exam-taking rate from 64 to 99% and largely eliminating selection into exam-taking. We apply different selection correction methods, using different sets of co- variates, to the selected exam score data from before 2007. We compare the estimated coefficients from the selection-corrected models to those from OLS regressions using the complete exam score data from after 2007 as a benchmark. We find that less restrictive semiparametric correction methods typically perform better than parametric correction methods but not better than simple OLS regressions that do not correct for selection. Performance is generally worse for models that use only a few discrete covariates than for models that use more covariates with less coarse distributions. Keywords: education, sample selection, selection correction models, test scores ∗We thank Susan Dynarski, John Bound, Brian Jacob, and Jeffrey Smith for their advice and support. We are grateful for helpful conversations with Peter Arcidiacono, Eric Brunner, Sebastian Calonico, John DiNardo, Michael Gideon, Shakeeb Khan, Matt Masten, Arnaud Maurel, Stephen Ross, Kevin Stange, Caroline Theoharides, Elias Walsh, and seminar participants at AEFP, Econometric Society North American Summer Meetings, Michigan, NBER Economics of Education, SOLE, and Urban Institute.
    [Show full text]
  • A General Instrumental Variable Framework for Regression Analysis with Outcome Missing Not at Random
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Collection Of Biostatistics Research Archive Harvard University Harvard University Biostatistics Working Paper Series Year 2013 Paper 165 A General Instrumental Variable Framework for Regression Analysis with Outcome Missing Not at Random Eric J. Tchetgen Tchetgen∗ Kathleen Wirthy ∗Harvard University, [email protected] yHarvard University, [email protected] This working paper is hosted by The Berkeley Electronic Press (bepress) and may not be commer- cially reproduced without the permission of the copyright holder. http://biostats.bepress.com/harvardbiostat/paper165 Copyright c 2013 by the authors. A general instrumental variable framework for regression analysis with outcome missing not at random Eric J. Tchetgen Tchetgen1,2 and Kathleen Wirth2 Departments of Biostatistics1 and Epidemiology2, Harvard University Correspondence: Eric J. Tchetgen Tchetgen, Departments of Biostatistics and Epidemiology, Harvard School of Public Health 677 Huntington Avenue, Boston, MA 02115. 1 Hosted by The Berkeley Electronic Press Abstract The instrumental variable (IV) design has a long-standing tradition as an approach for unbiased evaluation of the effect of an exposure in the presence of unobserved confounding. The IV ap- proach is also well developed to account for covariate measurement error or misclassification in regression analysis. In this paper, the authors study the instrumental variable approach in the context of regression analysis with an outcome missing not at random, also known as nonignorable missing outcome. An IV for a missing outcome must satisfy the exclusion restriction that it is not independently related to the outcome in the population, and that the IV and the outcome are correlated in the observed sample only to the extent that both are associated with the missingness process.
    [Show full text]
  • Misaccounting for Endogeneity: the Peril of Relying on the Heckman Two-Step Method Without a Valid Instrument
    Received: 9 November 2017 Revised: 24 September 2018 Accepted: 15 November 2018 DOI: 10.1002/smj.2995 RESEARCH ARTICLE Misaccounting for endogeneity: The peril of relying on the Heckman two-step method without a valid instrument Sarah E. Wolfolds1 | Jordan Siegel2 1Charles H. Dyson School of Applied Economics & Management, Cornell SC Johnson Research Summary: Strategy research addresses endo- College of Business, Ithaca, New York geneity by incorporating econometric techniques, includ- 2Ross School of Business, University of Michigan, ing Heckman's two-step method. The economics literature Ann Arbor, Michigan theorizes regarding optimal usage of Heckman's method, Correspondence emphasizing the valid exclusion condition necessary in Sarah E. Wolfolds, Charles H. Dyson School of Applied Economics & Management, Cornell SC the first stage. However, our meta-analysis reveals that Johnson College of Business, 375E Warren Hall, only 54 of 165 relevant papers published in the top strat- Ithaca, NY 14853. egy and organizational theory journals during 1995–2016 Email: [email protected] claim a valid exclusion restriction. Without this condition Funding information Cornell University Dyson School of Applied being met, our simulation shows that results using the Economics and Management, University of Heckman method are often less reliable than OLS results. Michigan Ross School of Business Even where Heckman is not possible, we recommend that other rigorous identification approaches be used. We illus- trate our recommendation to use a triangulation of identifi- cation approaches by revisiting the classic global strategy question of the performance implications of cross-border market entry through greenfield or acquisition. Managerial Summary: Managers make strategic deci- sions by choosing the best option given the particular cir- cumstances of their firm.
    [Show full text]
  • Lecture 1: Introduction to Regression
    Lecture 11: Alternatives to OLS with limited dependent variables, part 2 . Poisson . Censored/truncated models . Negative binomial . Sample selection corrections . Tobit PEA vs APE (again) . PEA: partial effect at the average . The effect of some x on y for a hypothetical case with sample averages for all x’s. This is obtained by setting all Xs at their sample mean and obtaining the slope of Y with respect to one of the Xs. APE: average partial effect . The effect of x on y averaged across all cases in the sample . This is obtained by calculating the partial effect for all cases, and taking the average. PEA vs APE: different? . In OLS where the independent variable is entered in a linear fashion (no squared or interaction terms), these are equivalent. In fact, it is an assumption of OLS that the partial effect of X does not vary across x’s. PEA and APE differ when we have squared or interaction terms in OLS, or when we use logistic, probit, poisson, negative binomial, tobit or censored regression models. PEA vs APE in Stata . The “margins” function can report the PEA or the APE. The PEA may not be very interesting because, for example, with dichotomous variables, the average, ranging between 0 and 1, doesn’t correspond to any individuals in our sample. “margins, dydx(x) atmeans” will give you the PEA for any variable x used in the most recent regression model. “margins, dydx(x)” gives you the APE PEA vs APE . In regressions with squared or interaction terms, the margins command will give the correct answer only if factor variables have been used .
    [Show full text]
  • Heckman Correction Technique - a Short Introduction Bogdan Oancea INS Romania Problem Definition
    Heckman correction technique - a short introduction Bogdan Oancea INS Romania Problem definition We have a population of N individuals and we want to fit a linear model to this population: y1 x1 1 but we have only a sample of only n individuals not randomly selected Running a regression using this sample would give biased estimates. The link with our project – these n individuals are mobile phone owners connected to one MNO; One possible solution : Heckman correction. Heckman correction Suppose that the selection of individuals included in the sample is determined by the following equation: 1, if x v 0 y 2 0 , otherwise where y2=1 if we observe y1 and zero otherwise, x is supposed to contain all variables in x1 plus some more variables and ν is an error term. Several MNOs use models like the above one. Heckman correction Two things are important here: 1. When OLS estimates based on the selected sample will suffer selectivity bias? 2. How to obtain non-biased estimates based on the selected sample? Heckman correction Preliminary assumptions: (x,y2) are always observed; y1 is observed only when y2=1 (sample selection effect); (ε, v) is independent of x and it has zero mean (E (ε, v) =0); v ~ N(0,1); E ( | ) , correlation between residuals (it is a key assumption) : it says that the errors are linearly related and specifies a parameter that control the degree to which the sample selection biases estimation of β1 (i.e., 0 will introduce the selectivity bias). Heckman correction Derivation of the bias: E(y1 | x,v)
    [Show full text]
  • Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes Theodore Eisenberg Cornell Law School (Deceased)
    Cornell University Law School Scholarship@Cornell Law: A Digital Repository Cornell Law Faculty Publications Faculty Scholarship 3-2015 Addressing the Zeros Problem: Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes Theodore Eisenberg Cornell Law School (deceased) Thomas Eisenberg Cornell University Martin T. Wells Cornell University, [email protected] Min Zhang Purdue University Follow this and additional works at: https://scholarship.law.cornell.edu/facpub Part of the Statistical Methodology Commons, and the Statistical Models Commons Recommended Citation Eisenberg, Theodore and Eisenberg, Thomas and Wells, Martin T. and Zhang, Min, "Addressing the Zeros Problem: Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes," 12 Journal of Empirical Legal Studies 161-186 (2015) This Article is brought to you for free and open access by the Faculty Scholarship at Scholarship@Cornell Law: A Digital Repository. It has been accepted for inclusion in Cornell Law Faculty Publications by an authorized administrator of Scholarship@Cornell Law: A Digital Repository. For more information, please contact [email protected]. Journalof Empirical Legal Studies Volume 12, Issue 1, 161-186, March 2015 Addressing the Zeros Problem: Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes Theodore Eisenberg, Thomas Eisenberg, Martin T. Wells, and Min Zhang* In law-related and other social science contexts, researchers need to account for data with an excess number of zeros. In addition, dollar damages in legal cases also often are skewed. This article reviews various strategies for dealing with this data type.
    [Show full text]
  • A Heckman Selection-T Model
    A Heckman Selection-t Model Yulia V. MARCHENKO and Marc G. GENTON Sample selection arises often in practice as a result of the partial observability of the outcome of interest in a study. In the presence of sample selection, the observed data do not represent a random sample from the population, even after controlling for explanatory variables. That is, data are missing not at random. Thus, standard analysis using only complete cases will lead to biased results. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. The method was criticized in the literature because of its sensitivity to the normality assumption. In practice, data, such as income or expenditure data, often violate the normality assumption because of heavier tails. We first establish a new link between sample selection models and recently studied families of extended skew-elliptical distributions. Then, this allows us to introduce a selection-t (SLt) model, which models the error distribution using a Student’s t distribution. We study its properties and investigate the finite-sample performance of the maximum likelihood estimators for this model. We compare the performance of the SLt model to the conventional Heckman selection-normal (SLN) model and apply it to analyze ambulatory expenditures. Unlike the SLN model, our analysis using the SLt model provides statistical evidence for the existence of sample selection bias in these data. We also investigate the performance of the test for sample selection bias based on the SLt model and compare it with the performances of several tests used with the SLN model.
    [Show full text]
  • Inference for Non-Random Samples
    J. R. Statist. Soc. B (1997) 59, No. 1, pp. 55±95 Inference for Non-random Samples By J. B. COPAS{ and H. G. LI University of Warwick, Coventry, UK The Open University, Milton Keynes, UK [Read before The Royal Statistical Society at a meeting organized by the Research Section on Wednesday, April 17th, 1996, Professor R. L. Smith in the Chair] SUMMARY Observational data are often analysed as if they had resulted from a controlled study, and yet the tacit assumption of randomness can be crucial for the validity of inference. We take some simple statistical models and supplement them by adding a parameter which re¯ects the degree of non-randomness in the sample. For a randomized study is known to be 0. We examine the pro®le log-likelihood for and the sensitivity of inference to small non- zero values of . Particular models cover the analysis of survey data with item non- response, the paired comparison t-test and two group comparisons using observational data with covariates. Some practical examples are discussed. Allowing for sampling bias increases the uncertainty of estimation and weakens the signi®cance of treatment effects, sometimes substantially so. Keywords: DARWIN'S DATA; HECKMAN'S TWO-STAGE METHOD; ITEM NON-RESPONSE; MISSING DATA; PROFILE LIKELIHOOD; RANDOMIZATION; SELECTIVITY BIAS 1. INTRODUCTION One of the greatest contributions of Fisher was his insight into the importance of randomization, not only in the design of experiments and surveys but also as the logical underpinning of methods of analysis. In the third chapter of The Design of Experiments, Fisher (1966) (®rst edition 1935) discussed Darwin's famous data on the heights of cross- and self-fertilized plants and showed that a permutation analysis based on a simple model of randomization of treatment order to pairs gives almost exactly the same P-value as the t-test.
    [Show full text]
  • New Approach to Estimating Gravity Models with Heteroscedasticity and Zero Trade Values
    Munich Personal RePEc Archive New Approach to Estimating Gravity Models with Heteroscedasticity and Zero Trade Values Mnasri, Ayman and Nechi, Salem Qatar University, Qatar University 6 April 2019 Online at https://mpra.ub.uni-muenchen.de/93426/ MPRA Paper No. 93426, posted 24 Apr 2019 02:55 UTC New Approach to Estimating Gravity Models with HeteroscedasticityAyman Mnasri1 and Zero Qatar University Salem Nechi2 TradeQatar Values University Abstract This paper proposes new estimation techniques for gravity models with zero trade values and heteroscedasticity. We revisit the standard PPML estimator and we propose an improved version. We also propose various Heckman estimators with different distributions of the residuals, nonlinear forms of both selection and measure equations, and various process of the variance. We add to the existent literature alternative estimation methods taking into account the non-linearity of both the variance and the selection equation. Moreover, because of the unavailability of pre-set package in the econometrics software (Stata, Eviews, Matlab, etc.) to perform the estimation of the above-mentioned Heckman versions, we had to code it in Matlab using a combination of fminsearch and fminunc functions. Using numerical gradient matrix G, we report standard errors based on the BHHH technique. The proposed new Heckman version could be used in other applications. Our results suggest that previous empirical studies might be overestimating the contribution of the GDP of both import and export countries in determining the bilateral trade. JEL classification: F10, F14, C01, C10, C13, C15, C63 Key words: Gravity model, Heteroscedasticity; Zero Trade values; New Heckman; New PPML 1 Corresponding author.
    [Show full text]