Statistical Aspects of a Census

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Aspects of a Census Statistical Aspects of a Census Carol C. House _________________________________________________________________________________ This paper focuses on the statistical aspects of a census. It addresses issues such as the coverage, classification, sampling, non-sampling error, post collection processing, weighting and disclosure avoidance. The intent of the paper is to demonstrate that most (if not all) of the statistical issues that are important in conducting a survey are equally germane to conducting a census. KEY WORDS: census, coverage, non-response, error, frames, imputation, disclosure _________________________________________________________________________________ 1 INTRODUCTION1 with respect to well-defined characteristics”. This definition is more In this paper the author will provide a useable. We now look at the term basic overview of the statistical aspects “statistics” to further focus the paper. of planning, conducting and publishing Again from ISI we find that statistics is data from a census. The intent of the the “numerical data relating to an paper is to demonstrate that most (if not aggregate of individuals; the science of all) of the statistical issues that are collecting, analyzing and interpreting important in conducting a survey are such data.” Together these definitions equally germane to conducting a census. render a focus for this paper -- those issues germane to the science and/or In order to establish the scope for this methodology of collecting, analyzing and paper, we begin by reviewing some basic interpreting data through what is intended definitions. Webster's New Collegiate to be a complete enumeration of a Dictionary defines a “census” to be “a population at a point in time with respect count of the population and a property to well-defined characteristics. Further, evaluation in early Rome”. Although because of the nature of the CAESAR particularly appropriate to quote at the conference, this paper will direct its CAESAR conference, we will want to discussion to agricultural censuses. utilize a broader definition. The Important issues include the (sampling) International Statistical Institute (ISI) in frame, sampling methodology, non- its Dictionary of Statistical Terms defines sampling error, processing, weighting, a census to be “the complete enumeration modeling, disclosure avoidance, and data of a population or group at a point in time dissemination. This paper touches on each of these issues as appropriate to the paper’s focus on censuses of agriculture. 1This paper was presented at the Conference on Agricultural and Environmental Statistical 2 FRAME Applications in Rome (CAESAR), June 5-7, 2001. Carol House is with the National Agricultural Statistics Whether conducting a sample survey or Service, Research and Development Division, and is the Division Director. a census, a core component of 1 methodology is the sampling frame. The The Australian Bureau of Statistics frame usually consists of a listing of (Sward, et. al., 1998) intentionally population units, but alternatively it excludes smaller farms from their might be a structure from which clusters business register and census of of units can be delineated. For agriculture. They focus instead on agricultural censuses, the frame is likely production agriculture, and maintain that to be a business register or a farm their business register has good coverage register. Alternatively it might be a for that target population. Statistics listing of villages from which individual Canada (Lim, et. al., 2000) has dropped farm units can be delineated during data the use of an area frame as part of its collection. The use of an area frame is a census of agriculture, and is conducting third common alternative. Often more research on using various sources of than a single frame is used for a census. administrative data to improve coverage Papers presented at the Agricultural of its farm register. Kiregyera (1998) Statistics 2000 conference highlight the reports that a typical agriculture census in diversity of sampling frames used for Africa will completely enumerate larger agricultural censuses (Sward, et. al.; operations (identified on some listing), Kiregyera; David). but does not attempt to enumerate completely the smaller operations There are three basic statistical concerns because of the resources required to do associated with sampling frames: so. Instead they select a sample from a coverage, classification and duplication. frame of villages or land areas, and These concerns are equally relevant delineate small farms within the sampled whether the frame will be used for a areas for enumeration. In the United census or sampled for a survey. States, the farm register used for the 1997 Census of Agriculture covered 86.3% of 2.1 Coverage all farms, but 96.4% of farms with gross value of sales over $10,000 and 99.5% of Coverage deals with how well the frame the total value of agricultural products. fully delineates all population units. The The U.S. uses a separate area sampling statistician’s goal should be to maximize frame to measure under-coverage of its coverage of the frame and to provide farm register, and has published global measures of under-coverage. For measures of coverage. They are agricultural censuses, coverage often investigating methodology to model differs by size of farming operation. under-coverage as part of the 2002 Larger farms are covered more census and potentially publish more completely, and smaller farms less so. detailed measures of that coverage. Complete coverage of smaller farms is highly problematic, and statistical 2.2 Classification organizations have used different strategies to deal with this coverage A second basic concern with a sampling problem. frame is whether frame units are accurately classified. The primary 2 classification is whether the unit is, in 2.3 Duplication fact, a member of the target population, and thus should be represented on the A third basic concern with a sampling frame. For example, in the U.S. there is frame is duplication. There needs to be a an official definition of a farm: one-to-one correspondence between operations that sold $1,000 or more of population units and frame units. agricultural products during the target Duplication occurs when a population year, or would normally sell that much. unit is represented by more than one The first part of the definition is fairly frame unit. Similar to misclassification, straightforward, but the second causes duplication is an ongoing concern with considerable difficulty with all business registers. Software is classification. available to match a list against itself to search for potential duplication. This Classification is further complicated process may eliminate much of the when a population unit is linked with, or duplication prior to data collection. owned by, another business entity. This Often it is important in a census or is an ongoing problem for all business survey to add questions to the data registers. The statistician’s goal is to collection instrument that will assist in a employ reasonable, standardized post-collection evaluation of duplication. classification algorithms that are In its 1997 Census of Agriculture, the consistent with potential uses of the U.S. conducted a separate “classification census data. For example, a large error study” in conjunction with the farming operation may be a part of a census. For this study, a sample of larger, vertically integrated enterprise census respondents was re-contacted to which may have holdings under semi- examine potential misclassification and autonomous management in several duplication, and to estimate levels of dispersed geographic areas. Should each both. geographically dispersed establishment be considered a farm, or should the 3 SAMPLING enterprise be considered a single farm and placed only once on the sampling When one initially thinks of a census or frame? Another example is when large complete enumeration, statistical conglomerates contract with small, sampling may not seem relevant. independent farmers to raise livestock. However, in the implementation of The larger firm (contractor) places agricultural censuses throughout the immature animals with the contractee world, a substantial amount of sampling who raises the animals. The contractor has been employed. David (1998) maintains ownership of the livestock, presents a strong rationale for extensive supplies feed and other input expenses, use of sampling for agricultural censuses, then removes and markets the mature citing specifically those conducted in animals. Which is the farm – the Nepal and the Philippines. The reader is contractor, the contractee, or both? encouraged to review his paper for more details. This paper does not attempt an 3 intensive discussion of different sampling 4 NON-SAMPLING ERROR techniques, but identifies some of the major areas where sampling has (or can Collection of data generates sampling be) employed. and non-sampling errors. We have already discussed situations in which Reducing costs is a major reason that sampling, and thus sampling error, may statistical organizations have employed be relevant in census data collection. sampling in their census processes. We Non-sampling errors are always present, have already discussed how agricultural and generally can be expected to increase censuses in Africa, Nepal, and the as the number of contacts and the Philippines have used sampling complexity of questions increases. Since extensively for smaller farms. Sampling censuses
Recommended publications
  • 811D Ecollomic Statistics Adrllillistra!Tioll
    811d Ecollomic Statistics Adrllillistra!tioll BUREAU THE CENSUS • I n i • I Charles G. Langham Issued 1973 U.S. D OF COM ERCE Frederick B. Dent. Secretary Social Economic Statistics Edward D. Administrator BU OF THE CENSUS Vincent P. Barabba, Acting Director Vincent Director Associate Director for Economic Associate Director for Statistical Standards and 11/1",1"\"/1,, DATA USER SERVICES OFFICE Robert B. Chief ACKNOWLEDGMENTS This report was in the Data User Services Office Charles G. direction of Chief, Review and many persons the Bureau. Library of Congress Card No.: 13-600143 SUGGESTED CiTATION U.S. Bureau of the Census. The Economic Censuses of the United by Charles G. longham. Working Paper D.C., U.S. Government Printing Office, 1B13 For sale by Publication Oistribution Section. Social and Economic Statistics Administration, Washington, D.C. 20233. Price 50 cents. N Page Economic Censuses in the 19th Century . 1 The First "Economic Censuses" . 1 Economic Censuses Discontinued, Resumed, and Augmented . 1 Improvements in the 1850 Census . 2 The "Kennedy Report" and the Civil War . • . 3 Economic Censuses and the Industrial Revolution. 4 Economic Censuses Adjust to the Times: The Censuses of 1880, 1890, and 1900 .........................•.. , . 4 Economic Censuses in the 20th Century . 8 Enumerations on Specialized Economic Topics, 1902 to 1937 . 8 Censuses of Manufacturing and Mineral Industries, 1905 to 1920. 8 Wartime Data Needs and Biennial Censuses of Manufactures. 9 Economic Censuses and the Great Depression. 10 The War and Postwar Developments: Economic Censuses Discontinued, Resumed, and Rescheduled. 13 The 1954 Budget Crisis. 15 Postwar Developments in Economic Census Taking: The Computer, and" Administrative Records" .
    [Show full text]
  • 2019 TIGER/Line Shapefiles Technical Documentation
    TIGER/Line® Shapefiles 2019 Technical Documentation ™ Issued September 2019220192018 SUGGESTED CITATION FILES: 2019 TIGER/Line Shapefiles (machine- readable data files) / prepared by the U.S. Census Bureau, 2019 U.S. Department of Commerce Economic and Statistics Administration Wilbur Ross, Secretary TECHNICAL DOCUMENTATION: Karen Dunn Kelley, 2019 TIGER/Line Shapefiles Technical Under Secretary for Economic Affairs Documentation / prepared by the U.S. Census Bureau, 2019 U.S. Census Bureau Dr. Steven Dillingham, Albert Fontenot, Director Associate Director for Decennial Census Programs Dr. Ron Jarmin, Deputy Director and Chief Operating Officer GEOGRAPHY DIVISION Deirdre Dalpiaz Bishop, Chief Andrea G. Johnson, Michael R. Ratcliffe, Assistant Division Chief for Assistant Division Chief for Address and Spatial Data Updates Geographic Standards, Criteria, Research, and Quality Monique Eleby, Assistant Division Chief for Gregory F. Hanks, Jr., Geographic Program Management Deputy Division Chief and External Engagement Laura Waggoner, Assistant Division Chief for Geographic Data Collection and Products 1-0 Table of Contents 1. Introduction ...................................................................................................................... 1-1 1. Introduction 1.1 What is a Shapefile? A shapefile is a geospatial data format for use in geographic information system (GIS) software. Shapefiles spatially describe vector data such as points, lines, and polygons, representing, for instance, landmarks, roads, and lakes. The Environmental Systems Research Institute (Esri) created the format for use in their software, but the shapefile format works in additional Geographic Information System (GIS) software as well. 1.2 What are TIGER/Line Shapefiles? The TIGER/Line Shapefiles are the fully supported, core geographic product from the U.S. Census Bureau. They are extracts of selected geographic and cartographic information from the U.S.
    [Show full text]
  • 2020 Census Barriers, Attitudes, and Motivators Study Survey Report
    2020 Census Barriers, Attitudes, and Motivators Study Survey Report A New Design for the 21st Century January 24, 2019 Version 2.0 Prepared by Kyley McGeeney, Brian Kriz, Shawnna Mullenax, Laura Kail, Gina Walejko, Monica Vines, Nancy Bates, and Yazmín García Trejo 2020 Census Research | 2020 CBAMS Survey Report Page intentionally left blank. ii 2020 Census Research | 2020 CBAMS Survey Report Table of Contents List of Tables ................................................................................................................................... iv List of Figures .................................................................................................................................. iv Executive Summary ......................................................................................................................... 1 Introduction ............................................................................................................................. 3 Background .............................................................................................................................. 5 CBAMS I ......................................................................................................................................... 5 CBAMS II ........................................................................................................................................ 6 2020 CBAMS Survey Climate ........................................................................................................
    [Show full text]
  • Sampling Methods It’S Impractical to Poll an Entire Population—Say, All 145 Million Registered Voters in the United States
    Sampling Methods It’s impractical to poll an entire population—say, all 145 million registered voters in the United States. That is why pollsters select a sample of individuals that represents the whole population. Understanding how respondents come to be selected to be in a poll is a big step toward determining how well their views and opinions mirror those of the voting population. To sample individuals, polling organizations can choose from a wide variety of options. Pollsters generally divide them into two types: those that are based on probability sampling methods and those based on non-probability sampling techniques. For more than five decades probability sampling was the standard method for polls. But in recent years, as fewer people respond to polls and the costs of polls have gone up, researchers have turned to non-probability based sampling methods. For example, they may collect data on-line from volunteers who have joined an Internet panel. In a number of instances, these non-probability samples have produced results that were comparable or, in some cases, more accurate in predicting election outcomes than probability-based surveys. Now, more than ever, journalists and the public need to understand the strengths and weaknesses of both sampling techniques to effectively evaluate the quality of a survey, particularly election polls. Probability and Non-probability Samples In a probability sample, all persons in the target population have a change of being selected for the survey sample and we know what that chance is. For example, in a telephone survey based on random digit dialing (RDD) sampling, researchers know the chance or probability that a particular telephone number will be selected.
    [Show full text]
  • THE CENSUS in U.S. HISTORY Library of Congress of Library
    Bill of Rights Constitutional Rights in Action Foundation FALL 2019 Volume 35 No1 THE CENSUS IN U.S. HISTORY Library of Congress of Library A census taker talks to a group of women, men, and children in 1870. The Constitution requires that a census be taken every ten After the 1910 census, the House set the total num- years. This means counting all persons, citizens and ber of House seats at 435. Since then, when Congress noncitizens alike, in the United States. In addition to reapportions itself after each census, those states gain- conducting a population count, the census has evolved to collect massive amounts of information on the growth and ing population may pick up more seats in the House at development of the nation. the expense of states declining in population that have to lose seats. Why Do We Have a Census? Who is counted in apportioning seats in the House? The original purpose of the census was to determine The Constitution originally included “the whole Number the number of representatives each state is entitled to in of free persons” plus indentured servants but excluded the U.S. House of Representatives. The apportionment “Indians not taxed.” What about slaves? The North and (distribution) of seats in the House depends on the pop- South argued about this at the Constitutional Conven- ulation of each state. Every state is guaranteed at least tion, finally agreeing to the three-fifths compromise. one seat. Slaves would be counted in each census, but only three- After the first census in 1790, the House decided a fifths of the count would be included in a state’s popu- state was allowed one representative for each approxi- lation for the purpose of House apportionment.
    [Show full text]
  • Survey Nonresponse Bias and the Coronavirus Pandemic∗
    Coronavirus Infects Surveys, Too: Survey Nonresponse Bias and the Coronavirus Pandemic∗ Jonathan Rothbaum U.S. Census Bureau† Adam Bee U.S. Census Bureau‡ May 3, 2021 Abstract Nonresponse rates have been increasing in household surveys over time, increasing the potential of nonresponse bias. We make two contributions to the literature on nonresponse bias. First, we expand the set of data sources used. We use information returns filings (such as W-2's and 1099 forms) to identify individuals in respondent and nonrespondent households in the Current Population Survey Annual Social and Eco- nomic Supplement (CPS ASEC). We link those individuals to income, demographic, and socioeconomic information available in administrative data and prior surveys and the decennial census. We show that survey nonresponse was unique during the pan- demic | nonresponse increased substantially and was more strongly associated with income than in prior years. Response patterns changed by education, Hispanic origin, and citizenship and nativity. Second, We adjust for nonrandom nonresponse using entropy balance weights { a computationally efficient method of adjusting weights to match to a high-dimensional vector of moment constraints. In the 2020 CPS ASEC, nonresponse biased income estimates up substantially, whereas in other years, we do not find evidence of nonresponse bias in income or poverty statistics. With the sur- vey weights, real median household income was $68,700 in 2019, up 6.8 percent from 2018. After adjusting for nonresponse bias during the pandemic, we estimate that real median household income in 2019 was 2.8 percent lower than the survey estimate at $66,790. ∗This report is released to inform interested parties of ongoing research and to encourage discussion.
    [Show full text]
  • MRS Guidance on How to Read Opinion Polls
    What are opinion polls? MRS guidance on how to read opinion polls June 2016 1 June 2016 www.mrs.org.uk MRS Guidance Note: How to read opinion polls MRS has produced this Guidance Note to help individuals evaluate, understand and interpret Opinion Polls. This guidance is primarily for non-researchers who commission and/or use opinion polls. Researchers can use this guidance to support their understanding of the reporting rules contained within the MRS Code of Conduct. Opinion Polls – The Essential Points What is an Opinion Poll? An opinion poll is a survey of public opinion obtained by questioning a representative sample of individuals selected from a clearly defined target audience or population. For example, it may be a survey of c. 1,000 UK adults aged 16 years and over. When conducted appropriately, opinion polls can add value to the national debate on topics of interest, including voting intentions. Typically, individuals or organisations commission a research organisation to undertake an opinion poll. The results to an opinion poll are either carried out for private use or for publication. What is sampling? Opinion polls are carried out among a sub-set of a given target audience or population and this sub-set is called a sample. Whilst the number included in a sample may differ, opinion poll samples are typically between c. 1,000 and 2,000 participants. When a sample is selected from a given target audience or population, the possibility of a sampling error is introduced. This is because the demographic profile of the sub-sample selected may not be identical to the profile of the target audience / population.
    [Show full text]
  • Categorical Data Analysis
    Categorical Data Analysis Related topics/headings: Categorical data analysis; or, Nonparametric statistics; or, chi-square tests for the analysis of categorical data. OVERVIEW For our hypothesis testing so far, we have been using parametric statistical methods. Parametric methods (1) assume some knowledge about the characteristics of the parent population (e.g. normality) (2) require measurement equivalent to at least an interval scale (calculating a mean or a variance makes no sense otherwise). Frequently, however, there are research problems in which one wants to make direct inferences about two or more distributions, either by asking if a population distribution has some particular specifiable form, or by asking if two or more population distributions are identical. These questions occur most often when variables are qualitative in nature, making it impossible to carry out the usual inferences in terms of means or variances. For such problems, we use nonparametric methods. Nonparametric methods (1) do not depend on any assumptions about the parameters of the parent population (2) generally assume data are only measured at the nominal or ordinal level. There are two common types of hypothesis-testing problems that are addressed with nonparametric methods: (1) How well does a sample distribution correspond with a hypothetical population distribution? As you might guess, the best evidence one has about a population distribution is the sample distribution. The greater the discrepancy between the sample and theoretical distributions, the more we question the “goodness” of the theory. EX: Suppose we wanted to see whether the distribution of educational achievement had changed over the last 25 years. We might take as our null hypothesis that the distribution of educational achievement had not changed, and see how well our modern-day sample supported that theory.
    [Show full text]
  • 2017 National Population Projections: Methodology and Assumptions
    Methodology, Assumptions, and Inputs for the 2017 National Population Projections September 2018 Erratum Note: The 2017 National Population Projections were revised after their original release date to correct an error in infant mortality rates. The files were removed from the website on August 1, 2018 and an erratum note posted. The error incorrectly calculated infant mortality rates, which erroneously caused an increase in the number of deaths projected in the total population. Correcting the error in infant mortality results in a decrease in the number of deaths and a slight increase in the total projected population in the revised series. The error did not affect the other two components of population change in the projections series (fertility and migration). Major demographic trends, such as an aging population and an increase in racial and ethnic diversity, remain unchanged. Table of Contents Introduction .......................................................................................................................................................................2 Methods...............................................................................................................................................................................2 Base Population ...........................................................................................................................................................2 Fertility and Mortality Denominators...................................................................................................................3
    [Show full text]
  • Proposed 2020 Census Data Products Plan
    Proposed 2020 Census Data Products Plan National Advisory Committee Spring 2019 Meeting Jason Devine Cynthia Hollingsworth Population Division Decennial Census Management Division May 2-3, 2019 Background • The Census Bureau has a long history of protecting information provided by respondents • Over the decades, more and more granular census data have been published • Advances in data science, more powerful computers, and externally accessible ‘big data’ – which contain a lot of personal information – has increased the risk of identifying individuals from published statistics • To mitigate this risk, the Census Bureau is transitioning to a new disclosure avoidance method called differential privacy 2 Background (cont.) • Our goal for the 2020 Census data products is to meet data user needs while implementing the new disclosure avoidance method - however, there are some challenges • We currently do not have solutions for protecting tabulations based on complex variables (characteristics of people within households), and variables with many possible values (detailed race/ethnicity) • We need your help in understanding what the “must-have” tables are both in terms of detail and geography to allow us to focus efforts on researching potential solutions to meet critical needs 3 Stakeholder Feedback • The primary way we collected feedback was through a July 2018 Federal Register notice, its extension, and associated outreach • Approximately 1,200 comments were received • Comments provided examples detailing legal, programmatic, or statistical needs for specific tables and geographies within the decennial products • We are using the Federal Register comments to inform the development of the proposed suite of 2020 data products 4 Examples of Federal Register Comments Received • Sex and age data used by Michigan Department of Education for School Aid and dollars for Michigan's school-aged population are distributed based on the numbers of persons in age cohorts.
    [Show full text]
  • We Help the Census Bureau Improve Its Processes and Products
    Annual Report of the Center for Statistical Research and Methodology Research and Methodology Directorate Fiscal Year 2020 Decennial Directorate Customers Missing Data and Small Area Time Series and Observational Data Estimation Seasonal Adjustment Demographic Directorate Customers Sampling Estimation Bayesian Methods and Survey Inference STATISTICAL EXPERTISE for Collaboration Economic and Research with Record Linkage and Experimentation and Directorate Entity Resolution DATA Prediction Customers Simulation, Data Machine Learning Visualization, and Spatial Statistics Modeling Field Directorate Customers Other Internal and External Customers ince August 1, 1933— S “… As the major figures from the American Statistical Association (ASA), Social Science Research Council, and new Roosevelt academic advisors discussed the statistical needs of the nation in the spring of 1933, it became clear that the new programs—in particular the National Recovery Administration—would require substantial amounts of data and coordination among statistical programs. Thus in June of 1933, the ASA and the Social Science Research Council officially created the Committee on Government Statistics and Information Services (COGSIS) to serve the statistical needs of the Agriculture, Commerce, Labor, and Interior departments … COGSIS set … goals in the field of federal statistics … (It) wanted new statistical programs—for example, to measure unemployment and address the needs of the unemployed … (It) wanted a coordinating agency to oversee all statistical programs, and (it) wanted to see statistical research and experimentation organized within the federal government … In August 1933 Stuart A. Rice, President of the ASA and acting chair of COGSIS, … (became) assistant director of the (Census) Bureau. Joseph Hill (who had been at the Census Bureau since 1900 and who provided the concepts and early theory for what is now the methodology for apportioning the seats in the U.S.
    [Show full text]
  • Questions Planned for the 2020 Census and American Community Survey Federal Legislative and Program Uses
    Questions Planned for the 2020 Census and American Community Survey Federal Legislative and Program Uses Issued March 2018 This page is intentionally blank. Contents Introduction . 1 Protecting the Information Collected by These Questions . 2 Questions Planned for the 2020 Census . 3 Age .......................................................................................... 5 Citizenship. 7 Hispanic Origin ................................................................................ 9 Race. 11 Relationship ................................................................................... 13 Sex. 15 Tenure (Owner/Renter) ......................................................................... 17 Operational Questions for use in the 2020 Census. ................................................. 19 Questions Planned for the American Community Survey . 21 Acreage and Agricultural Sales .................................................................. 23 Age .......................................................................................... 25 Ancestry. 27 Commuting (Journey to Work) .................................................................. 29 Computer and Internet Use ..................................................................... 31 Disability. 33 Fertility. 35 Grandparent Caregivers ........................................................................ 37 Health Insurance Coverage and Health Insurance Premiums and Subsidies ........................... 39 Hispanic Origin ...............................................................................
    [Show full text]