Statistical Aspects of a Census

Statistical Aspects of a Census Carol C. House _________________________________________________________________________________ This paper focuses on the statistical aspects of a census. It addresses issues such as the coverage, classification, sampling, non-sampling error, post collection processing, weighting and disclosure avoidance. The intent of the paper is to demonstrate that most (if not all) of the statistical issues that are important in conducting a survey are equally germane to conducting a census. KEY WORDS: census, coverage, non-response, error, frames, imputation, disclosure _________________________________________________________________________________ 1 INTRODUCTION1 with respect to well-defined characteristics”. This definition is more In this paper the author will provide a useable. We now look at the term basic overview of the statistical aspects “statistics” to further focus the paper. of planning, conducting and publishing Again from ISI we find that statistics is data from a census. The intent of the the “numerical data relating to an paper is to demonstrate that most (if not aggregate of individuals; the science of all) of the statistical issues that are collecting, analyzing and interpreting important in conducting a survey are such data.” Together these definitions equally germane to conducting a census. render a focus for this paper -- those issues germane to the science and/or In order to establish the scope for this methodology of collecting, analyzing and paper, we begin by reviewing some basic interpreting data through what is intended definitions. Webster's New Collegiate to be a complete enumeration of a Dictionary defines a “census” to be “a population at a point in time with respect count of the population and a property to well-defined characteristics. Further, evaluation in early Rome”. Although because of the nature of the CAESAR particularly appropriate to quote at the conference, this paper will direct its CAESAR conference, we will want to discussion to agricultural censuses. utilize a broader definition. The Important issues include the (sampling) International Statistical Institute (ISI) in frame, sampling methodology, non- its Dictionary of Statistical Terms defines sampling error, processing, weighting, a census to be “the complete enumeration modeling, disclosure avoidance, and data of a population or group at a point in time dissemination. This paper touches on each of these issues as appropriate to the paper’s focus on censuses of agriculture. 1This paper was presented at the Conference on Agricultural and Environmental Statistical 2 FRAME Applications in Rome (CAESAR), June 5-7, 2001. Carol House is with the National Agricultural Statistics Whether conducting a sample survey or Service, Research and Development Division, and is the Division Director. a census, a core component of 1 methodology is the sampling frame. The The Australian Bureau of Statistics frame usually consists of a listing of (Sward, et. al., 1998) intentionally population units, but alternatively it excludes smaller farms from their might be a structure from which clusters business register and census of of units can be delineated. For agriculture. They focus instead on agricultural censuses, the frame is likely production agriculture, and maintain that to be a business register or a farm their business register has good coverage register. Alternatively it might be a for that target population. Statistics listing of villages from which individual Canada (Lim, et. al., 2000) has dropped farm units can be delineated during data the use of an area frame as part of its collection. The use of an area frame is a census of agriculture, and is conducting third common alternative. Often more research on using various sources of than a single frame is used for a census. administrative data to improve coverage Papers presented at the Agricultural of its farm register. Kiregyera (1998) Statistics 2000 conference highlight the reports that a typical agriculture census in diversity of sampling frames used for Africa will completely enumerate larger agricultural censuses (Sward, et. al.; operations (identified on some listing), Kiregyera; David). but does not attempt to enumerate completely the smaller operations There are three basic statistical concerns because of the resources required to do associated with sampling frames: so. Instead they select a sample from a coverage, classification and duplication. frame of villages or land areas, and These concerns are equally relevant delineate small farms within the sampled whether the frame will be used for a areas for enumeration. In the United census or sampled for a survey. States, the farm register used for the 1997 Census of Agriculture covered 86.3% of 2.1 Coverage all farms, but 96.4% of farms with gross value of sales over $10,000 and 99.5% of Coverage deals with how well the frame the total value of agricultural products. fully delineates all population units. The The U.S. uses a separate area sampling statistician’s goal should be to maximize frame to measure under-coverage of its coverage of the frame and to provide farm register, and has published global measures of under-coverage. For measures of coverage. They are agricultural censuses, coverage often investigating methodology to model differs by size of farming operation. under-coverage as part of the 2002 Larger farms are covered more census and potentially publish more completely, and smaller farms less so. detailed measures of that coverage. Complete coverage of smaller farms is highly problematic, and statistical 2.2 Classification organizations have used different strategies to deal with this coverage A second basic concern with a sampling problem. frame is whether frame units are accurately classified. The primary 2 classification is whether the unit is, in 2.3 Duplication fact, a member of the target population, and thus should be represented on the A third basic concern with a sampling frame. For example, in the U.S. there is frame is duplication. There needs to be a an official definition of a farm: one-to-one correspondence between operations that sold $1,000 or more of population units and frame units. agricultural products during the target Duplication occurs when a population year, or would normally sell that much. unit is represented by more than one The first part of the definition is fairly frame unit. Similar to misclassification, straightforward, but the second causes duplication is an ongoing concern with considerable difficulty with all business registers. Software is classification. available to match a list against itself to search for potential duplication. This Classification is further complicated process may eliminate much of the when a population unit is linked with, or duplication prior to data collection. owned by, another business entity. This Often it is important in a census or is an ongoing problem for all business survey to add questions to the data registers. The statistician’s goal is to collection instrument that will assist in a employ reasonable, standardized post-collection evaluation of duplication. classification algorithms that are In its 1997 Census of Agriculture, the consistent with potential uses of the U.S. conducted a separate “classification census data. For example, a large error study” in conjunction with the farming operation may be a part of a census. For this study, a sample of larger, vertically integrated enterprise census respondents was re-contacted to which may have holdings under semi- examine potential misclassification and autonomous management in several duplication, and to estimate levels of dispersed geographic areas. Should each both. geographically dispersed establishment be considered a farm, or should the 3 SAMPLING enterprise be considered a single farm and placed only once on the sampling When one initially thinks of a census or frame? Another example is when large complete enumeration, statistical conglomerates contract with small, sampling may not seem relevant. independent farmers to raise livestock. However, in the implementation of The larger firm (contractor) places agricultural censuses throughout the immature animals with the contractee world, a substantial amount of sampling who raises the animals. The contractor has been employed. David (1998) maintains ownership of the livestock, presents a strong rationale for extensive supplies feed and other input expenses, use of sampling for agricultural censuses, then removes and markets the mature citing specifically those conducted in animals. Which is the farm – the Nepal and the Philippines. The reader is contractor, the contractee, or both? encouraged to review his paper for more details. This paper does not attempt an 3 intensive discussion of different sampling 4 NON-SAMPLING ERROR techniques, but identifies some of the major areas where sampling has (or can Collection of data generates sampling be) employed. and non-sampling errors. We have already discussed situations in which Reducing costs is a major reason that sampling, and thus sampling error, may statistical organizations have employed be relevant in census data collection. sampling in their census processes. We Non-sampling errors are always present, have already discussed how agricultural and generally can be expected to increase censuses in Africa, Nepal, and the as the number of contacts and the Philippines have used sampling complexity of questions increases. Since extensively for smaller farms. Sampling censuses

Load more