Estimation of Quantiles and the Interquartile Range in Complex Surveys Carol Ann Francisco Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1987 Estimation of quantiles and the interquartile range in complex surveys Carol Ann Francisco Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Francisco, Carol Ann, "Estimation of quantiles and the interquartile range in complex surveys " (1987). Retrospective Theses and Dissertations. 11680. https://lib.dr.iastate.edu/rtd/11680 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS While the most advanced technology has been used to photograph and reproduce this manuscript, the quality of the reproduction is heavily dependent upon the quality of the material submitted. For example: • Manuscript pages may have indistinct print. In such cases, the best available copy has been filmed. • Manuscripts may not always be complete. In such cases, a note will indicate that it is not possible to obtain missing pages. • Copyrighted material may have been removed from the manuscript. In such cases, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, and charts) are photographed by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each oversize page is also filmed as one exposure and is available, for an additional charge, as a standard 35mm slide or as a 17"x 23" black and white photographic print. Most photographs reproduce acceptably on positive microfilm or microfiche but lack the clarity on xerographic copies made from the microfilm. For an additional charge, 35mm slides of 6"x 9" black and white photographic prints are available for any photographs or illustrations that cannot be reproduced satisfactorily by xerography. Order Number 8721882 Estimation of quantiles and the interquartile range in complex surveys Francisco, Carol Ann, Ph.D. Iowa State University, 1987 UMI SOON.ZeebRd. Ann Arbor, MI 48106 Estimation of quantiles and the interquartile range in complex surveys by Carol Ann Francisco A Dissertation Submitted to the Graduate Faculty in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major: Statistics Approved: Signature was redacted for privacy. In Chaise of^Majot Work Signature was redacted for privacy. For the Major Department Signature was redacted for privacy. For the Grad e College Iowa State University Ames, Iowa 1987 il TABLE OF CONTENTS Page I. INTRODUCTION 1 II. PREVIOUS WORK IN QUANTILE ESTIMATION 8 A. Simple Random Sampling 8 1. Infinite Population Sampling 8 2. Finite Population Sampling 27 B. Stratified Random Sampling 46 C. Stratified Cluster Sampling 60 III. ESTIMATION OF QUANTILES AND THE INTERQUARTILE RANGE IN LARGE-SCALE SURVEYS 71 A. Properties of the Sample Mean 75 B. Properties of the Empirical Distribution Function 102 C. Confidence Intervals for Quantlles and the Interquartile Range 106 1. Confidence Intervals Based on Sample Quantlles 107 2. Quantile Confidence Intervals Based on Test Inversion 138 D. Extensions to More Complex Designs 156 IV. IMPLEMENTATION OF THE PROPOSED ESTIMATION PROCEDURES 158 A. Conputer Algorithm 159 B. Monte Carlo Simulations 167 1. Percentage of Urban Land Population 167 2. Soil Loss Population 171 3. Generated Lognormal Population 175 V. SUMMARY AND CONCLUSIONS 183 ill Page VI. REFERENCES 187 VII. ACKNOWLEDOIENTS 193 I. IHTBODUCTION Researchers are often interested in estimating cumulative distribution functions from survey data. For example, Sedransk and Sedransk (1979) examined the feasibility of using estimated cumulative distribution functions as one means of making comparisons among radi ation therapy facilities in a large-scale national survey of cancer patient medical records. Also of interest in many analytical surveys are estimators of functions of the cumulative distribution function such as the median and other quantiles, along with estimators of their standard errors and confidence intervals. Quantiles are of particular Interest when a variable is known to have a markedly skewed distribu tion. For example, in a survey of urban households, a point or interval estimate of median household Income may be desired. Studies which assume simple random sampling without replacement from an absolutely continuous distribution form the basis of most of the existing literature on quantile estimation. Estimators which have been proposed for this situation include the sample median and various linear combinations of order statistics (Karrell and Davis 1982, Kaigh and Lachenbruch 1982). Estimators of the standard error of the sample median which have been proposed Include a small sample variance estimator by Marltz and Jarrett (1978), a method which is essentially equivalent to the bootstrap "Method 1" estimator described by Efron (1979), and a generalized least squares estimator of the large sample standard error of the sample median (Sheather and Marltz 1983). 2 Procedures for constructing confidence intervals for the median under simple random sampling from a continuous distribution date back to the late 1930s (Nair 1940, Savur 1937, Thonçson 1936) and since then have been thoroughly investigated (David 1981). Sampling designs which are more complex than simple random sampling are generally used in survey sampling. Some information about the population of interest is usually available and often is Incorporated into the survey design and into the estimation process to Increase the precision of estimates. Extension of results derived under an assumption of simple random sampling from an absolutely continuous distribution to the complex sampling designs used in finite population sampling has met with limited success. Woodruff (1952) and Jonrup (1975) have proposed using a weighted sample median to estimate the population median, where the weight assigned to each observation is proportional to the inverse of its selection probability. Using the approach taken by Maritz and Jarrett (1978), Gross (1980) derived a small-sample estimator of the variance of the weighted sample median estimator for stratified sampling without replacement from a finite population. Current methods for constructing confidence intervals for quantiles vary depending upon the approach taken relative to making inferences from samples to the finite population. Broadly speaking, the finite population can be considered to be a fixed set of elements, or it can be considered to come from an infinite population, called a superpopula tion. 3 Under the classical or fixed population approach (Cassel, Sarndal, and Wretman 1977), each population value is regarded as a fixed but unknown quantity. The sample is viewed as random, and the units within the sample are considered fixed quantities. Inference from the sample to the finite population is based primarily upon the stochastic element introduced by the sampling plan. The superpopulation approach considers the observed values in a survey to be outcomes of an underlying process or superpopulation model. IMlike the classical approach, the superpopulation approach takes the view that associated with each population unit is a random variable which has a given stochastic structure. The superpopulation model plays a vital role in inference with this approach. From the observations in a sample, inferences are made about the superpopulation model. The superpopulation model then is used to make predictions concerning the population units not included in the sample and, ultimately, the population parameter of interest. A number of authors have investigated model-free procedures for constructing exact 100(1 - a) percent confidence intervals for quantiles in finite populations (0 < a < 1) . Inferences from the sang)le to the finite population are based upon confidence Intervals which take into account the sampling scheme. Thompson (1936), Wilks (1962), and Konijn (1973) have given design-based confidence intervals for the sample median when simple random sampling from a finite population is assumed. Meyer (1972) and Sedransk and Meyer (1978) investigated three exact confidence interval procedures for quantiles 4 when sampling is from a stratified population. Stratified random sang)ling when the population has been divided into two strata either randomly or in an ordered fashion (based on the known distribution of an increasing concomitant variable) was assumed. Confidence intervals for two of the proposed procedures are formed from pairs of order statistcs; a third procedure is derived from the empirical distribution function. For more than two strata, the confidence Interval procedures proposed by Meyer (1972) and Sedransk and Meyer (1978) become very complex and require substantial amounts of computation to implement. Theoretical results are tedious but can be extended to more than two strata. Meyer (1972) has given results for the case of three strata. Blesseos (1976) has extended Meyer's procedures for determining the confidence coefficient of confidence Intervals based on order statistics from the combined sample to the case where the finite population has up to five ordered or