Newsletter Issue 13 July 2001
Total Page:16
File Type:pdf, Size:1020Kb
Newsletter Issue 13 July 2001 SRMS Program for the Joint Selected Moments in the Statistical Meetings in Development of Probability Atlanta Sampling: by Jim Lepkowski, SRMS Program Chair Theory & Practice by Tommy Wright, U.S. Census Bureau The ASA is advertising the Joint Statistical Meetings coming in August as “Hotlanta.” For those of us who enjoy hot weather, there will be time during the meetings to get Population and Sample away and enjoy the warm (very warm!) weather. For those who do not enjoy it, we have developed a program that “All scientific observation, whether statistical or not, should be able to keep you indoors and busy for all five is based on sampling,” says Stephan (1948).(1) “The days of the meetings. earliest examples of sampling procedures are to be found in certain very ordinary human activities. The The 2001 Joint Statistical Meetings are August 5-9, Sunday common practice of taking a small part or portion for through midday Thursday. Papers on a wide range of tasting or testing to determine the characteristics of survey topics will be presented across 13 time slots on those the whole precedes recorded history and is one of the days, beginning at 2 PM on Sunday. After 2 PM and 4 PM roots from which sampling methodology stems...” sessions on Sunday, there are invited and contributed paper sessions at 8:30, 10:30, and 2:00 Monday, Tuesday, The current approach to sampling assumes a given finite (Continued on p. 7) collection of units, called a population. When examination of each and every unit in the population to know a I NSIDE T HIS I SSUE particular population characteristic is undesirable or impractical, a sample, i.e., a subset or portion of the 1 SRMS Program for the Joint Statistical Meetings in population, may be selected to yield satisfactory Atlanta information regarding the particular population 1 Selected Moments in the Development of characteristic. The population characteristic, called a Probability Sampling: Theory & Practice parameter, is often a quantitative one. In such cases, a 8 Section News statistic is computed using information collected from the small and more manageable sample, and its value is used 8 Executive Committee Reports to estimate the unknown value of the parameter. Although 11 Focus On . we desire a sample that will provide a “good” estimate of the unknown value of the parameter, it is certainly 13 Past Conferences conceivable that the sample information obtained could lead to a very poor estimate. 13 Upcoming Conferences 14 Awards Probability Sampling 15 Announcements Probability sampling makes use of the laws of probability 16 Significant Digits in the selection of the sample and in the construction of efficient estimators. With probability sampling, every 17 SRMS Charter population unit has a known positive chance of being 19 Executive Committee Members (Continued on p. 2) SRMS Newsletter Page 1 (Continued from page 1) 1890: Herman Hollerith applies his punch card machines to selected for the sample. Probability sampling provides a the “speedy” processing and tabulation of the 1890 U.S. means of measuring how good one believes an estimate is Census. This initial use of technology eventually spread to relative to all the possible estimates from all of the possible the use of computers and other technology, for the samples. collection, processing, analysis, and dissemination of data from complex sample surveys and has assisted the (6) A Census or a Sample development of further theory (Bellhouse, 2000). When limited resources such as time and costs dictate that a 1895: A.N. Kiaer calls for sampling based on the complete census is not possible, sampling is an alternative. “representative method.” At the Berne meeting of the In certain cases, such as determining the accuracy of a International Statistical Institute (ISI), Kiaer (first Director missile, sampling is the only way! Historically, however, of the Norwegian Central Bureau of Statistics) puts forward the application of sampling techniques has had its ups and the idea that a partial investigation (i.e., a sample) could downs, largely owing to common misconceptions about provide useful information based on what he called the sampling. “representative method.” His representative method aimed to produce a sample which was a miniature of the As Kish (1979)(2) has pointed out, censuses, if done population and can be described as follows: (1) in social correctly, have the potential advantage of providing precise, and economic surveys, one could begin by choosing detailed, and credible information on all population units. districts, towns, parts of cities, streets, etc., to be followed On the other hand, samples have the advantage of providing by systematic, rather than probabilistic, choice of units richer, more complex, accurate, inexpensive, and timely (houses, families, individuals); (2) there should be information about the entire population. substantial sample sizes at all levels of such a selection process; (3) the sample should be spread out in a variety of Sampling: Its Development ways, primarily geographically, but in other ways as well. For example, if a sample had a deficiency of cattle farmers, (5) Following are selected major moments in the theoretical and he would add more of them. practical development of probability sampling methodology. 1896: Petersen presents a sampling methodology for estimating the size of a finite population (of fish). The 1802: P.S. Laplace uses sampling to estimate the Petersen estimator provides the basis for capture-recapture population of France as of September 22, 1802. Laplace estimation, widely used in the estimation of a wildlife persuaded the French government to take a sample of the population size; and from humble beginnings, a very large (7) small administrative districts known as communes and to capture-recapture scientific literature has developed. count the total population (y) in the sample communes on September 22, 1802. From the known total number of 1897: At a conference of Scandinavian statisticians held in registered births (birth registration was required) during the Stockholm, a conference resolution gives guarded support preceding year in these communes (x) and in the whole for the representative method being promoted by A.N. (5) y Kiaer. country (X), the ratio estimate X of the population of x France could be calculated. Laplace also derived several 1903: Randomization is proposed for use in sample theoretical properties of the estimator. Laplace (1786) selection. Lucien March, a French statistician in the demonstrated this method earlier in estimating the 1782 discussion to Kiaer’s paper at the 1903 Berlin International population of France.(3) Assuming a closed population (i.e., Statistical Institute meeting, was the first to introduce (with caution) concepts related to the use of probability (i.e., no births, deaths, nor movement across population (5) boundaries during the preceding year), this ratio estimator is randomization) in the selection of the sample. similar to the Petersen estimator (see later). A similar method had been used for estimating the population of 1906: Bowley presents a central limit theorem for random England as early as 1662 by John Graunt.(4) sampling. Arthur Lyon Bowley presents a paper which seeks to give an empirical verification to a type of central 19th Century: There is very limited use of sampling. For limit theorem for simple random sampling by observing that the distribution of 40 sample means was approximately bell- government statistical agencies, the generally accepted (5) method of coverage was a complete enumeration. Very shaped (i.e., normal). limited sampling was done.(5) SRMS Newsletter Page 2 1912: Bowley uses a systematically chosen sample of 1937: W. Edwards Deming invites Neyman to come to houses to study poverty in Reading, England. Bowley often Washington, DC to give a series of lectures on probability checked the representativeness of his samples by comparing sampling at the Department of Agriculture Graduate his sample results to known population counts of variables School.(5) The notes from these lectures were published and on which these counts were available. For two cases in became a major way for Neyman’s ideas to gain wide which he found a discrepancy between his sample and the acceptance in the United States. official statistics, on further checking he discovered that the official statistics were in error.(5) 1938: U.S. Census Bureau uses national sample to estimate unemployment. In the mid-1930’s, the United States was in 1925: Based on the work of a commission to study the the grip of the Great Depression, and there was urgent need application of the representative method, the International for current information on the unemployed. But estimates of Statistical Institute’s meeting in Rome adopts a resolution the number of employed varied by many millions of persons which gives acceptance to certain sampling methods both and the next decennial census would not occur until 1940. A by random and purposive (nonrandom) selection.(5) Census of Unemployment was undertaken as a nationwide voluntary registration of the unemployed and partially 1926: Bowley provides a theoretical monograph on unemployed. Lack of confidence in the ability to control the random and purposive selection. As a major discussant of accuracy of the unemployment registration (through the the resolution adopted on the representative method at the post office) led to the idea of an enumerative check 1925 International Statistical