1988: Coverage Error in Establishment Surveys
Total Page:16
File Type:pdf, Size:1020Kb
COVERAGE ERROR IN ESTABLISHMENT SURVEYS Carl A. Konschnik U.S. Bureau of the CensusI I. Definition of Coverage Error in the survey planning stage results in a sam- pled population which is too far removed from Coverage error which includes both under- the target population. Since estimates based coverage and overcoverage, is defined as "the on data drawn from the sampled population apply error in an estimate that results from (I) fail- properly only to the sampled population, inter- ure to include all units belonging to the est in the target population dictates that the defined population or failure to include speci- sampled population be as close as practicable fied units in the conduct of the survey to the target population. Nevertheless, in the (undercoverage), and (2) inclusion of some units following discussion of the sources, measure- erroneously either because of a defective frame ment, and control of coverage error, only or because of inclusion of unspecified units or deficiencies relative to the sampled population inclusion of specified units more than once in are included. Thus, when speaking of defective the actual survey (overcoverage)" (Office of frames, only those deficiencies are discussed Federal Statistical Policy and Standards, 1978). which arise when the population which is sampled Coverage errors are closely related to but differs from the population intended to be clearly distinct from content errors, which are sampled (the sampled population). defined as the "errors of observation or objec- tive measurement, of recording, of imputation, Coverage Error Source Categories or of other processing which results in associ- ating a wrong value of the characteristic with a We will now look briefly at the two cate- specified unit" (Office of Federal Statistical gories of coverage error--defective frames and Policy and Standards, 1978). Thus, an inter- defective processes associated with the selec- viewer's failure to properly identify and hence ted sample. to record data for what should be a selected Defective Frames--Defective frames are char- unit is a coverage error. On the other hand, acterJized by (I) deficiencies in meeting the failure to pick up data for a properly selected requirement that every element of the sampled unit (which results in an imputed value being population belongs to one and only one sampling assigned to the unit) is a content error. unit, (2) erroneous inclusion of units (includ- Content errors include response and nonresponse ing the wrong units or having duplicates of errors. However, content errors as well as units which belong in the frame), or (3) erro- other nonsampling error types will not be dis- neous exclusion of sampling units. These cussed in this paper apart from contrasting problems can result from vague or unworkable them to coverage error. definitions of the sampling units relative to the sampled population; improper procedures or II. Sources of Coverage Error processing in establishing and maintaining the frame; timing, which affects the updatedness While the definition divides coverage error (agreement with the proper reference period) of into two major components--undercoverage and the frame; or miscoding of sampling units. overcoverage--another important duality is im- Erroneous inclusion (overcoverage) results from plied within each of these. Coverage error including duplicates and out-of-scope or out-of- shows up (I) in defective sampling frames and business units. Erroneous exclusion of sampling (2) as a result of defective processes associa- units (undercoverage) results from failure to ted with the selected sample. (Sampling frame, include the proper units or failure to account or stated simply, frame is used here to mean the for birth (new) units. Misclassification of collection of sampling units, either given ex- units, such as for Standard Industrial Classi- plicitly as a list or implicitly in terms of fication (SIC), geography, size class, or com- well-defined procedures.) pany structure can lead either to undercoverage Thus coverage error results either because or overcoverage. the frame does not properly represent the sam- Some frame problems cannot be overcome with pled population, or because the sample does not out expending significant resources. For properly represent the frame. Note that, using example, most frames suffer from some degree of the definitions of Cochran (1977), we are making outdatedness. A monthly survey in which the a distinction between the sampled population, frame and sample are updated quarterly, such as defined as the population to be sampled, and the the Census Bureau's Monthly Wholesale Trade target population, defined as the population Survey (MWTS), does not have an up-to-date frame at)out which information is wanted (if possible). for at least two out of every three months--and Ideally, the sampled and target populations this is over and above the lag time in getting should coincide. However, cost or other practi- new units on the list frame. This time lag cal considerations sometimes result in a lack of itself can be as much as 12 to 18 months after coincidence between the two. Consequently, the a business starts up. For example, the Social target population is usually modified to coin- Security Administration (SSA) lists of Employer cide with a workable sampled population. Identification (El) numbers newly assigned by Any difference between the sampled and target Internal Revenue Service (IRS) are given to the populations can contribute importantly to cover- Census Bureau after SSA receives the El applica- age error, especially where excessive compromise tion forms from IRS and codes them. Each proc- 309 essing step contributes to the lag. Because the in coding will cause overcoverage or under- cost and processing difficulties preclude cor- coverage of the frame. recting for this frame error, the Census Bureau Two prevalent forms of miscoding are (I) com- accounts for new units in its estimates by an pletely unclassified units (especially for SIC) imputation technique. The overall objective is and (2) units which do not have sufficient cod- to correct errors which can be corrected within ing detail for survey purposes. Unclassified resource limitations and thereby keep coverage units lead to undercoverage since units belonging error as low as is feasible. in the frame cannot be identified. Insufficient Defective Processes Associated with the Se- coding detail--for example, when four-digit SIC lected Sample--Coverage errors in " Which the detail is needed and only two- or three-digit selected sample does not correctly represent detail is available--can lead to either under- the frame may be the result of selected cases coverage or overcoverage for surveys requiring being inadvertently dropped from the sample or finer levels of industry coding. non selected cases being added to the sample Some causes of miscoding are (I) inadequate erroneously. Also, errors may be made in information on which to base a code; (2) poorly selecting the sample. Errors of this type are trained coders; and (3) faulty procedures or likely to occur when the sample is determined processes, such as miskeying. by interviewers in the field. In business area Errors of Timeliness--Errors of timeliness samples where the sampling units are geographic result when the frame or sample is not updated land segments, failure to properly identify the to the same reference period as that of the population units (business establishments of a survey. For example, units no longer in busi- particular type) is a common form of coverage ness that remain in the frame or sample may lead error. Such errors may result from inadequate to overcoverage. Lack of timely updating for definitions or inadequately specified field or new units may lead to undercoverage. For a list office procedures, outdated or otherwise incor- frame in which the presence of nonzero payroll rect maps of selected area sample units, or is used as an indicator of "activeness," sea- misapplication of the sampling or canvassing sonal businesses may be erroneously deleted rules by the interviewer. Failure to sample during their off season. Here again we see the from an updated frame on a timely basis also dichotomous nature of coverage error: in sur- results in a sample that is not representative veys which are carried out over time, it is of the frame, and hence of the sampled possible to have timely updating of the sam- population. For other papers which discuss pling frame, but unless the sample, in turn, is coverage concepts and issues, see Garrett, updated to reflect these changes, significant et al. (1986) and United Nations (1982). coverage error can result. In some survey de- It is worth noting here that even where signs it is impossible to completely eliminate coverage of a total population is fairly good, coverage error due to the timing of frame or serious problems may exist for certain subpopu- sample updates. This is especially true for lations. For example, national estimates might list sample designs. However, use of an area be good, while estimates covering smaller sample to supplement the list sample, such as geographic areas may be inadequate because of the Census Bureau uses in its Monthly Retail defective geographic coding at the lower (state, Trade Survey (MRTS), can theoretically reduce county, etc.) level. coverage error due to timing to zero. Structural, organizational, or activity Specific Error Sources changes not reflected in the frame or sample may occur because of the lack of timeliness in As we have seen, errors of undercoverage or updating. Often SIC changes occur which are not overcoverage can be the result of defective reflected in the frame or sample. Similarly, frames or of faulty sampling processes. failure to update for other characteristic Moreover, the same sources of error can affect changes, such as company reorganizations, both the frame and the selected sample and can acquisitions, and divestments or mergers, re- lead to either undercoverage or overcoverage.