A Bibliography of Selected Statistical Methods and Development Related to Census 2000
Total Page:16
File Type:pdf, Size:1020Kb
BUREAU OF THE CENSUS STATISTICAL RESEARCH DIVISION Statistical Research Report Series No. RR 2000/02 A BIBLIOGRAPHY OF SELECTED STATISTICAL METHODS AND DEVELOPMENT RELATED TO CENSUS 2000 Tommy Wright and Joyce Farmer Statistical Research Division Methodology and Standards Directorate U.S. Bureau of the Census Washington, D.C. 20233 Report Issued: August 10, 2000 OVERVIEW In an attempt to help facilitate the continuing national conversation on the supplemental use of sampling methods as a part of Census 2000, this report is offered and contains two main parts: PART I. SELECTED MOMENTS IN THE DEVELOPMENT OF PROBABILITY SAMPLING: Theory & Practice This part focuses briefly on probability sampling methodol- ogy while pointing to selected moments in its development as a serious tool for scientific inquiry. PART II. AN INDEX & THE LISTING OF AN ANNOTATED BIBLIOGRAPHY This part provides brief summaries of selected papers which might be of interest to anyone with an interest in beginning a technical background study of the methodology which helps form the foundation of the Census Bureau’s planned use of sampling and estimation to improve the count from Census 2000. This report is a major revision of the earlier issued report (Third Edition: May 1, 2000) under the same title. We are grateful to our colleagues: Hazel Beaton for her expert typing of this report as well as the three earlier drafts, Juanita Rasmann for editing the final draft, Don Malec for calling the 1786 Laplace paper to our attention, and Yves Thibaudeau who read the 1786 Laplace paper (in French) and verified for us that Laplace’s estimator can be viewed as similar to capture-recapture or dual system estimation. “This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a Census Bureau review more limited in scope than that given to official Census Bureau publications. This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress.” i ii PART I characteristic, a sample, i.e., a subset or portion of the SELECTED MOMENTS population, may be selected to yield satisfactory information IN THE DEVELOPMENT OF regarding the particular population characteristic. The PROBABILITY SAMPLING: population characteristic is often a quantitative one. In such Theory & Practice cases, a statistic is computed using information collected from the small and more manageable sample, and its value Practically everyday we pick up the newspaper or tune in is used to estimate the unknown value of the population to a news broadcast and are bombarded with data in the form characteristic. Although we desire a sample that will of numbers, graphs, and tables. We see the results of a study provide a "good" estimate of the unknown value of the done by the Gallup Poll. Forest personnel can tell the number population characteristic, it is certainly conceivable that the of deer inhabiting a certain land area. Reliable estimates of sample information obtained could lead to a very incorrect world grain production can be made before harvest by use of estimate. satellite data. Nielson’s Ratings can tell the approximate number of people who watched television in a given week and Sampling and Nonsampling Errors the proportion who watched a particular program. Much of Error is the difference between the known value of the this information and more is made possible by an area of estimator from the sample and the (true but unknown) value statistics referred to as probability sampling or simply of the population characteristic. Error can occur due to sampling. sampling reasons and/or nonsampling reasons. Sampling error is the error that is caused by measuring only the Uses of Sampling sampling units instead of all of the population units. Sampling methods are used throughout the world by a variety Nonsampling error is the error that is caused by reasons of individuals, groups, and organizations, as well as by local, other than sampling. Examples of nonsampling errors state, and national governments. They are used successfully include failure to get responses from all of the sample units, in many fields including agriculture, business, defense, failure of the measuring device to operate properly, and economics, education, energy, environment, finance, health, failure to correctly process the sample data. industry, labor, natural resource management, demographics, and transportation. Some specific applications include taking It is well known that the magnitude of nonsampling opinion polls, election polls, and polls for rating TV programs; errors can far exceed the magnitude of sampling error in a surveying animal populations (particularly fish, deer, etc.) and given sample. Unfortunately while much has been written farms; taking a sample of buildings; taking air samples to about measuring and controlling sampling error, relatively monitor air quality; sampling to monitor traffic activity; little is known about the quantification and estimation of the sampling to estimate energy consumption; sampling to monitor magnitude of nonsampling errors. Current practice seeks to a nation’s economy; taking samples before marketing a new minimize both sampling and nonsampling errors. Small product; taking a sample for auditing or inventory purposes; samples where resources are used to implement high quality taking soil samples to measure radioactivity levels; sampling data collection methods which control and minimize to monitor employment; sampling to monitor education nonsampling errors along with efficient statistical tech- progress; and taking samples of products produced at a niques (sampling and estimation) that seek to minimize manufacturing plant to monitor output quality. sampling error is an attractive combination for success. "All scientific observation, whether statistical or not, is Probability Sampling based on sampling," says Stephan (1948).(1) "The earliest Probability sampling makes use of the laws of probability examples of sampling procedures are to be found in certain in the selection of the sample and in the construction of very ordinary human activities. The common practice of efficient estimators. With probability sampling, every taking a small part or portion for tasting or testing to deter- population unit has a known positive chance of being mine the characteristics of the whole precedes recorded selected for the sample. Probability sampling provides a history and is one of the roots from which sampling methodol- means for saying how good one believes an estimate is ogy stems..." relative to all the possible estimates from all of the possible samples. That is, probability allows us to extend results Population and Sample from the sample to the entire population. The current approach to sampling assumes a given finite collection of units, called a population. It is often the case A Census or a Sample that certain characteristics of the population are needed but When limited resources such as time and costs dictated that unknown. When examination of each and every unit in the a complete census was not possible, sampling has been an population is undesirable to know a particular population alternative. Historically, however, the application of iii sampling techniques has had its ups and downs, largely owing could begin by choosing districts, towns, parts of cities, to common misconceptions about sampling. streets, etc., to be followed by systematic, rather than probabilistic, choice of units (houses, families, individuals); The heart of these misconceptions seems to be a belief (2) there should be substantial sample sizes at all levels of that if one wants to know something about a given population, such a selection process; (3) the sample should be spread it is better to contact the entire population (a census) rather out in a variety of ways, primarily geographically, but in than only a sample of the population. As Kish (1979) (2) has other ways as well. For example, if a sample had a defi- pointed out, censuses, if done correctly, have the potential ciency of cattle farmers, he would add more of them. (5) advantage of providing precise, detailed, and credible information on all population units. On the other hand, 1896: Petersen presents a sampling methodology for samples have the advantage of providing richer, more com- estimating the size of a finite population. The Petersen plex, accurate, inexpensive, and timely information for a estimator provides the heuristic basis of most estimators of sample which can be extended to the entire population. wildlife population size; and from humble beginnings, a very large capture-recapture scientific literature has Indeed the joint judicious use of both sampling and developed. (6) census taking offers the best opportunity for greatest benefit. 1897: At a conference of Scandinavian statisticians held Sampling: Its Development in Stockholm, a conference resolution gives guarded Following are selected major moments in the theoretical and support for the representative method being promoted practical development of probability sampling methodology. by A.N. Kiaer. (5) 1802: P.S. Laplace uses sampling to estimate the popula- 1903: Randomization is proposed for use in sample tion of France as of September 22, 1802. Laplace persuaded selection. Lucien March, a French statistician, who, in the the French government to take a sample of the small adminis- discussion to Kiaer’s paper at the 1903 Berlin International trative districts known as communes and to count the total Statistical Institute meeting, was the first to introduce (with population y in the sample communes on September 22, 1802. caution) concepts related to the use of probability (i.e., From the known total number of registered births (birth randomization) in the selection of the sample.(5) registration was required) during the preceding year in these communes x and in the whole country 1906: Bowley presents a central limit theorem for random sampling. Arthur Lyon Bowley presents a paper y X, the ratio estimate X of the population of France could which seeks to give an empirical verification to a type of x be calculated.