COVERAGE ERROR IN ESTABLISHMENT SURVEYS

Carl A. Konschnik U.S. Bureau of the CensusI

I. Definition of Coverage Error in the planning stage results in a sam- pled population which is too far removed from Coverage error which includes both under- the target population. Since estimates based coverage and overcoverage, is defined as "the on data drawn from the sampled population apply error in an estimate that results from (I) fail- properly only to the sampled population, inter- ure to include all units belonging to the est in the target population dictates that the defined population or failure to include speci- sampled population be as close as practicable fied units in the conduct of the survey to the target population. Nevertheless, in the (undercoverage), and (2) inclusion of some units following discussion of the sources, measure- erroneously either because of a defective frame ment, and control of coverage error, only or because of inclusion of unspecified units or deficiencies relative to the sampled population inclusion of specified units more than once in are included. Thus, when speaking of defective the actual survey (overcoverage)" (Office of frames, only those deficiencies are discussed Federal Statistical Policy and Standards, 1978). which arise when the population which is sampled Coverage errors are closely related to but differs from the population intended to be clearly distinct from content errors, which are sampled (the sampled population). defined as the "errors of observation or objec- tive measurement, of recording, of imputation, Coverage Error Source Categories or of other processing which results in associ- ating a wrong value of the characteristic with a We will now look briefly at the two cate- specified unit" (Office of Federal Statistical gories of coverage error--defective frames and Policy and Standards, 1978). Thus, an inter- defective processes associated with the selec- viewer's failure to properly identify and hence ted . to record data for what should be a selected Defective Frames--Defective frames are char- unit is a coverage error. On the other hand, acterJized by (I) deficiencies in meeting the failure to pick up data for a properly selected requirement that every element of the sampled unit (which results in an imputed value being population belongs to one and only one assigned to the unit) is a content error. unit, (2) erroneous inclusion of units (includ- Content errors include response and nonresponse ing the wrong units or having duplicates of errors. However, content errors as well as units which belong in the frame), or (3) erro- other nonsampling error types will not be dis- neous exclusion of sampling units. These cussed in this paper apart from contrasting problems can result from vague or unworkable them to coverage error. definitions of the sampling units relative to the sampled population; improper procedures or II. Sources of Coverage Error processing in establishing and maintaining the frame; timing, which affects the updatedness While the definition divides coverage error (agreement with the proper reference period) of into two major components--undercoverage and the frame; or miscoding of sampling units. overcoverage--another important duality is im- Erroneous inclusion (overcoverage) results from plied within each of these. Coverage error including duplicates and out-of-scope or out-of- shows up (I) in defective sampling frames and business units. Erroneous exclusion of sampling (2) as a result of defective processes associa- units (undercoverage) results from failure to ted with the selected sample. (Sampling frame, include the proper units or failure to account or stated simply, frame is used here to mean the for birth (new) units. Misclassification of collection of sampling units, either given ex- units, such as for Standard Industrial Classi- plicitly as a list or implicitly in terms of fication (SIC), geography, size class, or com- well-defined procedures.) pany structure can lead either to undercoverage Thus coverage error results either because or overcoverage. the frame does not properly represent the sam- Some frame problems cannot be overcome with pled population, or because the sample does not out expending significant resources. For properly represent the frame. Note that, using example, most frames suffer from some degree of the definitions of Cochran (1977), we are making outdatedness. A monthly survey in which the a distinction between the sampled population, frame and sample are updated quarterly, such as defined as the population to be sampled, and the the Bureau's Monthly Wholesale Trade target population, defined as the population Survey (MWTS), does not have an up-to-date frame at)out which information is wanted (if possible). for at least two out of every three months--and Ideally, the sampled and target populations this is over and above the lag time in getting should coincide. However, cost or other practi- new units on the list frame. This time lag cal considerations sometimes result in a lack of itself can be as much as 12 to 18 months after coincidence between the two. Consequently, the a business starts up. For example, the Social target population is usually modified to coin- Security Administration (SSA) lists of Employer cide with a workable sampled population. Identification (El) numbers newly assigned by Any difference between the sampled and target Internal Revenue Service (IRS) are given to the populations can contribute importantly to cover- Census Bureau after SSA receives the El applica- age error, especially where excessive compromise tion forms from IRS and codes them. Each proc-

309 essing step contributes to the lag. Because the in coding will cause overcoverage or under- cost and processing difficulties preclude cor- coverage of the frame. recting for this frame error, the Census Bureau Two prevalent forms of miscoding are (I) com- accounts for new units in its estimates by an pletely unclassified units (especially for SIC) imputation technique. The overall objective is and (2) units which do not have sufficient cod- to correct errors which can be corrected within ing detail for survey purposes. Unclassified resource limitations and thereby keep coverage units lead to undercoverage since units belonging error as low as is feasible. in the frame cannot be identified. Insufficient Defective Processes Associated with the Se- coding detail--for example, when four-digit SIC lected Sample--Coverage errors in " Which the detail is needed and only two- or three-digit selected sample does not correctly represent detail is available--can lead to either under- the frame may be the result of selected cases coverage or overcoverage for surveys requiring being inadvertently dropped from the sample or finer levels of industry coding. non selected cases being added to the sample Some causes of miscoding are (I) inadequate erroneously. Also, errors may be made in information on which to base a code; (2) poorly selecting the sample. Errors of this type are trained coders; and (3) faulty procedures or likely to occur when the sample is determined processes, such as miskeying. by interviewers in the field. In business area Errors of Timeliness--Errors of timeliness samples where the sampling units are geographic result when the frame or sample is not updated land segments, failure to properly identify the to the same reference period as that of the population units (business establishments of a survey. For example, units no longer in busi- particular type) is a common form of coverage ness that remain in the frame or sample may lead error. Such errors may result from inadequate to overcoverage. Lack of timely updating for definitions or inadequately specified field or new units may lead to undercoverage. For a list office procedures, outdated or otherwise incor- frame in which the presence of nonzero payroll rect maps of selected area sample units, or is used as an indicator of "activeness," sea- misapplication of the sampling or canvassing sonal businesses may be erroneously deleted rules by the interviewer. Failure to sample during their off season. Here again we see the from an updated frame on a timely basis also dichotomous nature of coverage error: in sur- results in a sample that is not representative veys which are carried out over time, it is of the frame, and hence of the sampled possible to have timely updating of the sam- population. For other papers which discuss pling frame, but unless the sample, in turn, is coverage concepts and issues, see Garrett, updated to reflect these changes, significant et al. (1986) and United Nations (1982). coverage error can result. In some survey de- It is worth noting here that even where signs it is impossible to completely eliminate coverage of a total population is fairly good, coverage error due to the timing of frame or serious problems may exist for certain subpopu- sample updates. This is especially true for lations. For example, national estimates might list sample designs. However, use of an area be good, while estimates covering smaller sample to supplement the list sample, such as geographic areas may be inadequate because of the Census Bureau uses in its Monthly Retail defective geographic coding at the lower (state, Trade Survey (MRTS), can theoretically reduce county, etc.) level. coverage error due to timing to zero. Structural, organizational, or activity Specific Error Sources changes not reflected in the frame or sample may occur because of the lack of timeliness in As we have seen, errors of undercoverage or updating. Often SIC changes occur which are not overcoverage can be the result of defective reflected in the frame or sample. Similarly, frames or of faulty sampling processes. failure to update for other characteristic Moreover, the same sources of error can affect changes, such as company reorganizations, both the frame and the selected sample and can acquisitions, and divestments or mergers, re- lead to either undercoverage or overcoverage. sults in coverage error. Following are some specific sources of coverage Duplication Errors--Duplicate units on a frame error that are observable and measurable: can occur when, for example, a partnership busi- Coding Errors--Mi scodi ng of industry or ness appears twice, once under each of the Stan'dard Industrial Classification (SIC) coding, partners' identifiers, or when the predecessor geographic coding, size coding, or company and successor establishments both show up as structure assignment results in frame errors. active on the frame, as in the case of a busi- Such errors lead either to undercoverage or ness takeover. This same predecessor/successor overcoverage depending on whether the correct situation can affect the sample if one of the units are excluded from the frame or incorrect units involved is a selected sampling unit. In units included in the frame. Including out-of- addition, both a parent firm and its subsidiary scope units (units which should not be included could appear as separate sampling units on a in the sampling frame based on the nature of frame if the association were not indicated. their business or industrial activity) in the This would lead to overcoverage if a parent firm frame results from errors in industry coding and all its subsidiaries are intended to be one and causes overcoverage. By the same token, the sampling unit. Thus, processing or procedural exclusion of units of the proper industry re- errors can result in duplication error. sults in undercoverage. Similarly, if address, Duplication error may also occur when the geographic codes, size, or any other attribute sampling frame is composed of various lists, is a determinant for the sampling frame, errors which must then be unduplicated. Any error in

310 this process can result in duplicate units being timeliness, duplication of units, omission of overlooked. This is often a problem where the units, and other errors resulting in incorrect primary identifiers on the component lists ei- coverage of the sampled population follow" ther don't match or are incomplete. Duplication Sampling from multiple frames--Using an area problems also show up in dual frame surveys. sample to supplement and complete coverage for a For example, in the Census Bureau's Monthly list sample is sometimes necessary to obtain Retail Trade Survey (MRTS), business establish- complete coverage of the sampled population. ments interviewed by personal enumeration in the Integrati.o.n of multipl e.li.sts for frame area sample must be unduplicated from the list deve"l opment--I ntegrat i n9 and undup'licating sample frame. When the employer identification several lists to construct a single frame is (El) number, which is the primary identifier, is frequently done since most lists are composites incorrect or missing, the potential for duplica- of various sources. tion error is particularly great. Here again, Conducting special frame improvement surveys-- while duplicate units cause overcoverage, prob- The Company Organi'zatio'n' sur'vey and SIC c'lassifi- lems in proper unduplication can also result in cation card mailings for the Census Bureau's a case being incorrectly deleted. Standard Statistical Establishment List (SSEL) Deficiencies in administrative record sys- are examples of these types of surveys. The terns, , or surveys on which the frame economic censuses themselves constitute a frame is based--Lack of or delays in reporting in improvement mechanism for all surveys drawn the administrative systems, censuses, or surveys subsequently from the SSEL. can cause coverage error. For example, although Use of two-phase sampling--This is done in firms are asked to submit a separate report the Census Bureau°s business birth sampling form for each of their establishments in the program. A first-phase sample is selected based economic censuses of the Census Bureau, some on SIC (including unclassified or insufficiently firms invariably provide combined reports on one classified units) and payroll or employment size. form. This results in both a deficiency in the A survey is conducted on this sample to produce frame of multiunit establishments and also in better coding and to obtain sales data which are an undercount of the number of business used as the measure of size for second-phase establ i shments. sampling. Nonlocatable units--Sometimes units selected Updating for births--Timely updating of the into the sample are not contacted because they frame and sample for births and deaths. cannot be found. In area sample surveys, for Updating for structural changes--Timely updat- example, certain types of businesses, such as ing of the frame and sample for structural and service nonemployer establishments may not be organization changes of the sampling units. locatabl e. Noncontact can also occur where Sample validation--Producing a proof of sample street addresses (for personal surveys) tabulation whereby sample estimates are compared or mailing addresses are erroneous or incomplete. to universe totals for the same characteristic. Interviewer errors--Errors made by an inter- This provides verification that the sample prop- viewer in 'the field can result in the sample erly represents the frame. being improperly identified. Interviewer Enlarging the scope of the survey--Often, in "curbstoning" (that is, the interviewer filling orde~r to Capture al'l of the' units relevant to the out the survey forms without ever properly iden- survey, it is necessary to include possible or tifying the establishment or conducting the marginally possible units. During editing, the requisite interviews) and careless canvassing out-of-scope units can be dropped. Care must be can also lead to an improperly selected sample, taken to properly drop all the out-of-scope loss of population units, or inclusion of erro- units so that overcoverage does not occur. neous units. Using independent control counts--These counts Processing errors--Computer programming errors are often needed to verify the correctness or can cause a portion of the selected sample to completeness of the frame. The source of the be omitted from the survey or can result in a counts could come from those for the frame for an deficient frame from which to draw the sample. earlier period as well as other sources. lJnits not included due to the processing error Internal consistency checks for frame can also result from poor field procedures or content''This involves pe'rformf'ng ' internal inadequate or incorrect sample maps or consistency checks on the frame data fields, materials. Improper identification of the especially in record identification fields and sample at the central sampling facility due to fields which determine whether the unit is in or computer or procedural problems can also result out of scope. in undercoverage. Processing errors (including Internal consistency checks for duplicate errors in drawing the sample at the central recor~ds--Thi s pr'ocedure i nvol'v'es performing sample facility) can lead either to under- internal consistency checks to identify duplicate coverage or overcoverage. records on the frame. Include as inscope units with out-of-scope Ill. Control of Coverage Error address, geography, industry, size--The practice of cons'idering as inscope units i~hose which are Coverage error can be controlled by many dif- truly out of scope due to updates or changes in ferent means. One principle often followed is to address, geographic, industry or size code is identify those areas where coverage error is sometimes used in an effort to represent true most serious and assign resources to reduce the inscope units which are not picked up because error there. Some specific and frequently used they are thought to be out of scope. This techniques which reduce miscoding, lack of amounts to adjusting for coverage error.

311 Include units closed for the season--Retaining rectly measure coverage error. units closed for a season rather than dropping Out-of-business rate--The rate at which frame them and losing their contribution when they or sa~nple" un'its go out of business, when compared become active again is usually necessary to main- to other measures or other time periods, provides tain a frame because of the lack of timeliness in a useful coverage error measurement. reinstating the units. Unclassified rate--A component of coverage Having correct, clear, and manageable sample error can be estimated by looking at the rate of control and frame maintenance procedures--All unclassified units. These when combined with aspects of sample c06i~rol and frame construction studies of the correct classification of this and maintenance must be well thought out and group provide a measurement of undercoverage. clearly specified. Misclassified rate--A look at this rate and Setting up adequate checks on processing--This related studies can provide measurements of the is necessary to ensure correct processing of all extent of coverage error at all levels of survey types" interviewer, clerical, and computer. tabulation. Improving field materials--Improving field Duplication rate--Determination of the number procedures and materials, such as addresses, of repeated or duplicated units in a frame or maps, and other interviewer materials helps to sample gives useful information on coverage reduce coverage error. problems. Interviewer selection and training--Carefully Sample attrition rate--The sample attrition selecting and training interv{ewers and coders rates, or the rates at which the units in the can have a substantial impact on reducing cover- sample go out of business, when contrasted to age error. This includes having well-trained birth rates and independently identified out-of- supervisors oversee the survey operations. business rates, provide indications of the extent Instituting a public relations campaign--This of coverage error. involves not'ifying the survey popul'ation of the survey or census in advance in an attempt to Direct Techniques elicit their participation. Reinterviewing procedures--These serve as a Direct techniques for measuring coverage error quality 'check on coverage" error, especially for usually entail carefully planned and executed area sample surveys. survey procedures designed to provide a reliable For an example of the procedures which are estimate of coverage error. The following are followed for maintaining frame and sample cover- examples of these direct techniques: age for a large, ongoing retail trade survey, see Post-enumeration surveys--Used here, this is Konschnik, et al. (1985). synono~nous with a post-audit whereby more exten- sive methods and procedures are used after the IV. Measurement of Coverage Error conduct of a survey or census in order to identi- fy and determine the effect of coverage errors The measurement of coverage error is necessary and other nonsampling errors. in surveys if one is to have some idea of its Matching known population units against frame extent as well as to identify sources most in units--Checking known population ~units against need of improvement. While the focus of coverage the frame provides some indication of the quality is on the inclusion or exclusion of the proper of coverage. However, a carefully drawn sample sampling units in the frame and sample, the of known units is required before accurate esti- measurement of coverage error frequently centers mates of coverage error can be provided. on its effects on the published estimates of the Checking the frame against alternative lists-- survey. For example, it may be determined that While the selected frame may be the bes't ~availa- a published estimate for retail sales of estab- ble list for the survey, checks can be made lishments in a certain SIC failed to include against other lists (either of greater or lesser estimates for a signficant number of nonemployer quality) to measure coverage error. establishments, but that including these non- Comparing other survey or census data or inde- employers would only very slightly influence the pendent aggregates--fndepend"ent "aggregai~e.... esti- survey results. The measure of undercoverage mates and tabulations covering the same charac- would be deemed small despite the number of sam- teristics for all or a part of the population pling units excluded. provide a source of comparison for identifying and measuring coverage error. !ndirect Techniques Rechecking interviewers' field work--Independ- ent rechecks of a sample of in tev'iewers' work are Coverage error can often be ascertained by an excellent way of identifying and measuring comparing current survey data with results from coverage error. earlier surveys or from external sources. Cover- Studying components of the frame--This in- age error may be indicated if the existing sample cludes assessing the various i classifications of shows certain changes at a significantly higher units which make up the list. or lower rate than the comparative data. Such measures as the birth rate, out-of-business rate, V. Summary Profile out-of-scope rate, unclassified rate, miscoded rate, duplication rate, and sample attrition rate This section presents some general results can all be used to identify and measure coverage compiled from a on survey practices error. which covered 55 major establishment surveys of Birth rate--Birth rates may be reviewed, com- Federal agencies. For the identification of paring one period to another in order to indi- these surveys, see Office of Management and

312 Fi gure 1 Budget (1988). Figures I and 2 give a summary COVERAGE ERROR of control procedures used in descending order CONTROL PROCEDURES of extent of use. Figures 3 and 4 characterize Extent and F~cq,-'ncy of Us,, measurements of coverage error taken for these surveys, in descending order of extent of use, for indirect and direct measures. Note that al- though the "not applicable" category is included when determining descending order, it is not in- cluded in any textual references in this section. The results in these graphs show that while the majority of these Federal surveys included provisions for controlling coverage error, 405 - the measurement of coverage error was less D Not widespread. Moreover, where measurements were t5 Pu:)ceduro Llood on taken, only a small percentage was published. am IrroguW Basle Thus, most measurements were for internal use to • Procoduro Ueod o(18 assess the adequacy of survey estimates. Rogulau 0% ! B88~ The most prevalent form of coverage control (96 percent) involved updating the frame for structural changes such as SIC changes, company reorganizations, mergers, etc. Updating of the sample for births was the second most prevalent form of coverage control (87 percent). Other control techniques reported as being used on more than half the surveys were: internal consistency Fi gure 2 COVERAGE ERROR checks for duplicate records on the frame (73 CONTROL PROCEDURES percent); internal consistency checks for frame Extent 8nd Frequency of" U~ content (69 percent); including as inscope units with errors or changes in address, geography, industry, or size, rather than dropping them as out of scope (67 percent); sample validation, i.e., comparison of weighted-up sample units to universe totals (67 percent); and integration of multiple lists for frame development (66 percent). Other fairly common control techniques reported were the conducting of special frame improvement surveys (49 percent) and retaining 13 Not units closed for the season (47 percent). /q:¢acaJ:~e Typically, little use (9 percent) was reported ~q Procedure of two-phase sampling for improving frames and • Procoduro Used on a samples although this method can prove beneficial Re~" in reducing the variance of estimates caused by BuJe frame problems. Also, on the low side in terms of relative use, only about 20 percent of the ~'~ er, • -,*-mot ~0~ " ~ "~& 0,iS%.._ ~,~'% "v -"- surveys reported sampling from multiple frames, such as using both a list and area sample. When looking at the measurement of coverage error, out-of-business and out-of-scope rates are most common with 67 percent and 62 percent of the Figure 3 survey population reported as having these meas- COVERAGE ERROR MEASUREMENT TECHNIQUES--INDIRECT urements taken, respectively. These measurements Frcq-,-ncy and Application of Us8 also have the highest rate of being published at 13 percent and 9 percent, respectively. A major- ity (60 percent) of the surveys reported comparing estimates produced in the surveys with estimates based on other independent sources. Measuring the misclassified rate (44 percent), matching known population units against frame units (47 6O% - percent), measuring the unclassified rates (38 USE percent), and measuring the sample attrition 405 - rates (36 percent) were also somewhat common. ~2g:i Least common were the conducting of post- enumeration surveys (20 percent) presumably because of the cost and resources involved; and rechecks on interviewers' listings (16 percent), primarily due to the nonapplicability i i ' of interviewers' involvement in listing for many of the surveys.

313 Fi gure 4 Garrett, J., Hogan, H., and Pautler, C. (1986), COVERAGE ERROR "Coverage Concepts and Issues in Data Collec- MEASUREMENT TECHNIQUES--DIRECT tion and Data Presentation," Proceedings of F~qtaency and Applic:~ioa of Us6 the Second Annual Research Conference, Washington, D.C." Bureau of the Census, pp. 329-334.

Kon schnik, C., Monsour, N. and Detlefsen, R. ~_____ (1985), "Constructing and Maintaining Frames 8O% - i and Samples for Business Surveys," Proceedings

USE of the Section on Survey Research Methods, American Statistical A'ssociation, pp. 113-112. 40~ - I Office of Federal Statistical Policy and Standards (1978), Statistical Policy Working Paper 4, Washington, D.C." Department of Commerce.

i i i ! ! i ! Office of Management and Budget (1988), Quality ",,++.0~+'+_"%Z~ "~+m+~ -,- ~%-- +"~ in Establishment Surveys, Statistical Policy +-,.+,,+~..?,~+.+,+.., .>,~,+,, +..~, + ...o+.,,+ Working Paper 15, Springfield, Va." National Technical Information Service (PB 88-23294).

1 This paper reports the general results of References research undertaken by Census Bureau staff. The views experessed are attributable to the Cochran, William G. (1977), Sampling Techniques, authors and do not necessarily reflect those 3rd ed., New York" John Wil~eyL and SOnsL. of the Census Bureau.

314