Measurement Errors in Binary Regressors: an Application to Measuring the Effects of Speci®C Psychiatric Diseases on Earnings

Health Services & Outcomes Research Methodology 1:2 (2000): 149±164 # 2000 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Measurement Errors in Binary Regressors: An Application to Measuring the Effects of Speci®c Psychiatric Diseases on Earnings ELIZABETH SAVOCA Department of Economics, Smith College, Wright Hall 221, Northampton, MA 01063 email: [email protected] Received December 10, 1998; revised January 10, 2000; accepted March 2, 2000 Abstract. This paper presents an overview of the theory of measurement error bias in ordinary regression estimators when several binary explanatory variables are mismeasured. This situation commonly occurs in health- related applications where the effects of illness are modeled in a multivariate framework and where health conditions are usually 0±1 survey responses indicating the absence or presence of diseases. An analysis of the effect of psychiatric diseases on male earnings provides an empirical example that indicates extensive measurement error bias even in sophisticated survey measures that are designed to simulate clinical diagnoses. A corrected covariance matrix is constructed from a validity study of the survey mental health indicators. When ordinary least squares estimators are adjusted by this correction matrix, the estimated earnings effects drop for certain diseases (drug abuse, general phobic disorders) and rise for others (anti-social personality). Keywords: measurement error; binary variables; multiple regression; psychiatric diseases 1. Introduction The topic of measurement errors in binary explanatory variables has been treated extensively in the theoretical and applied econometrics literature. Aigner (1973) was the ®rst to demonstrate that, unlike the classical case of a mismeasured explanatory variable, the measurement error in a binary regressor, often referred to as `classi®cation error', may have a non-zero mean and is negatively correlated with the true underlying variable. However, questions about whether these deviations from the classical errors-in-variables model lead to different conclusions about the consequences for ordinary least squares (OLS) estimators and about appropriate econometric solutions have not been fully explored. This issue is particularly relevant to empirical studies in health-related areas, where illness is often measured by the presence or absence of a diagnosis. This paper provides an overview of the theory of measurement error in binary regressors and of econometric methods to correct for measurement error bias. It argues the advantages of `modi®ed least squares', a method ®rst proposed by Johnston (1963), over alternative methods. The arguments are geared toward studies for which auxiliary or extra-sample information about the classi®cation error in the reported diagnosis is 150 SAVOCA available. The paper also provides an application of this method to study the effects of psychiatric diseases on earnings. The paper is divided into ®ve main sections. Section 1 presents the basic errors-in- variables model for a simple regression with a binary 0-1 regressor measured with error and derives the inconsistency in the OLS estimator. This section is essentially a synthesis of three previous econometric studies: Aigner (1973), Marquis et al. (1981), and Freeman (1984). Section 2 uses the theoretical results from the basic model to estimate the measurement error bias in OLS estimates of the effects of mental illness. This section focuses on a widely used psychiatric screening instrument, the Diagnostic Interview Schedule, and considers a wide range of psychiatric diseases. Section 3 extends the theoretical analysis to a multivariate framework where more than one binary regressor is measured with error. Section 4 discusses alternative strategies for dealing with measurement error bias. Section 5 illustrates the method of modi®ed least squares with an earnings equation estimated using microdata from the U.S. National Institute of Mental Health Epidemiological Catchment Area Survey. 2. Measurement Error Bias in a Binary Health Regressor: The Simple Regression Framework Consider a simple regression: y bx e where y is a continuous variable, x is a binary variable indicating the presence or absence of a disease, and e is a population regression error that follows all of the assumptions of the classical model. The outcome variable y can represent any continuous health outcome, such as medical expenditures or earnings. The regressor (x) is measured with error (u) according to the relationship: X x u where X is the diagnosis according to the survey instrument. Let P represent the proportion of persons in the population truly suffering from the disease. De®ne P~ as the proportion of persons diagnosed as having the disease by the survey. Suppose that the survey instrument incorrectly classi®es persons as not having the disease with probability r1 and incorrectly classi®es persons as having the disease with probability r0 . The probability r1 is referred to as the false negative rate; r0 as the false positive rate. X, x, and u can take on four possible combinations of values: X x u 00 0 10 1 11 0 01 1 À MEASUREMENT ERRORS IN BINARY REGRESSION 151 A non-zero correlation between x and u is evident from computing the probabilities of u conditional on the values of x: P u 1 x 0 0 À j P u 1 x 1 P X 0 x 1 r À j j 1 P u 0 x 0 P X 0 x 0 1 r j j À 0 1 P u 0 x 1 P X 1 x 1 1 r j j À 1 P u 1 x 0 P X 1 x 0 r j j 0 P u 1 x 1 0 j From here, the joint and marginal probabilities of x and u can be easily computed leading to these expressions for E u , VAR u , and COV x; u 1: E u r r r P 0 À 1 0 VAR u r r r P r r r P 2 2 0 1 À 0 À 0 À 1 0 COV x; u r r P 1 P À 1 0 À From the equations in (2) we see that only in the situation where P r0= r0 r1 will E u 0 and, hence, will the sample proportion be an unbiased estimate of the population proportion, i.e., E X E x P. We also see that the measurement error, u, is negatively correlated with the true diagnosis, x. These are two important deviations from the classical measurement error model. The least squares regression of y on X : y bX e bu yields a slope estimator ^ À bOLS with probability limit: COV X; e bu P 1 P 1 r r p limb^ b À b À À 0 À 1 3 OLS VAR X P~ 1 P~ À Hence, the OLS slope estimator in a simple regression approaches zero as the sum of the errors in classi®cation approach one. What may be less obvious from (3) is the result that ~ ~ the proportional bias term: P 1 P 1 r0 r1 =P 1 P is always less than one in absolute value (Aigner 1973). Furthermore, À À ifÀ the sum ofÀ the error rates exceeds one then the OLS coef®cient is oppositeÂÃ in sign to the true coef®cient. This situation would occur if more than half of the population were misclassi®ed by the survey instrument (Bollinger 1996). 152 SAVOCA 3. Evidence of Classi®cation Errors in a Survey Indicator of Mental Illness In the early to middle 1980s, the U.S. National Institute of Mental Health (NIMH) sponsored the Epidemiological Catchment Area (ECA) project. The project surveyed adults in 5 communities about their use of general medical and mental health services, their demographic background, and their employment situation. The survey also included the NIMH Diagnostic Interview Schedule (DIS), a highly structured interview designed for lay-interviewers as well as clinicians. The DIS questioned participants about the occur- rences of symptoms of psychiatric disorders. These responses were run through computer algorithms to simulate clinical diagnoses of speci®c psychiatric diseases according to criteria established by the American Psychiatric Association (1987) in the Diagnostic and Statistical Manual of Mental Disorder (DSM-IIIR). (See Eaton and Kessler 1985.) Although it has its critics (Jenkins et al. 1997, e.g.), the DIS is often regarded as the `gold standard' against which the validity of many survey instruments have been judged (Weinstein et al. 1989; Berwick et al. 1991). These include the General Health Questionnaire and several abbreviated versions of the RAND Mental Health Inventory, the latter serving as the main mental health assessment in the RAND Health Insurance Experiment. The DIS also forms the basis of the Composite International Diagnostic Interview (Robins et al. 1988), a questionnaire designed for worldwide use in both clinical settings and as a screening instrument in epidemiological surveys, such as the U.S. National Comorbidity Survey. Validation studies of the DIS were conducted at two ECA sites: St. Louis and Baltimore (Anthony et al. 1985, Helzer et al. 1985a). At both sites physicians were asked to reexamine a sample of ECA participants who had recently been given the DIS by lay- interviewers. Physicians were required to follow a structured format and obtain diagnoses according to DSM-III criteria.2 However, they were free to elicit from the subjects any information they deemed necessary to arrive at their diagnoses using their best clinical judgment. This study design served a twofold purpose. It insured that the full range of diseases diagnosed by the DIS could be evaluated. It also minimized variations in physician diagnoses resulting from a physician's failure to notice important clinical details and from a physician's use of highly idiosyncratic criteria in making diagnoses. In clinical settings studies of the agreement between two psychiatrists' assessments, using speci®ed criteria and structured interviews, show the same concordance as that between clinicians interpreting results from objective diagnostic tests such as X-rays and EKGs (Helzer et al.

Measurement Errors in Binary Regressors: an Application to Measuring the Effects of Speci®C Psychiatric Diseases on Earnings

On Assessing Binary Regression Models Based on Ungrouped Data

A Weakly Informative Default Prior Distribution for Logistic and Other

Measures of Fit for Logistic Regression Paul D

BUGS Example 1: Linear Regression Length 1.8 2.0 2.2 2.4 2.6

1D Regression Models Such As Glms

The Overlooked Potential of Generalized Linear Models in Astronomy, I: Binomial Regression

Generalized Linear Models I

Chapter 12 Generalized Linear Models

NRT Lectures - Statistical Modeling

Linearized Binary Regression

Bayesian Nonparametric Regression for Educational Research

Diagnosing Problems in Linear and Generalized Linear Models 6