A Small Area Procedure for Estimating Population Counts Emily J
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2010 A Small Area Procedure for Estimating Population Counts Emily J. Berg Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Statistics and Probability Commons Recommended Citation Berg, Emily J., "A Small Area Procedure for Estimating Population Counts" (2010). Graduate Theses and Dissertations. 11215. https://lib.dr.iastate.edu/etd/11215 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A small area procedure for estimating population counts by Emily Berg A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Sarah Nusser, Major Professor Wayne A. Fuller Mark Kaiser Jae Kwang Kim Dan Nettleton Dan Nordman Iowa State University Ames, Iowa 2010 ii ACKNOWLEDGEMENTS I would like to acknowledge and thank my major professor, Professor Fuller. iii Abstract Many large scale surveys are designed to achieve acceptable reliability for large domains. Direct estimators for more detailed levels of aggregation are often judged to be unreliable due to small sample sizes. Estimation for small domains, often defined by geographic and demographic characteristics, is known as small area estimation. A common approach to small area estimation is to derive predictors under a specified mixed model for the direct estimators. A procedure of this type is developed for small areas defined by the cells of a two-way table. Construction of small domain estimators using the Canadian Labour Force Survey (LFS) motivates the proposed model and estimation procedures. The LFS is designed to produce estimates of employment characteristics for certain pre-specified geographic and de- mographic domains. Direct estimators for specific occupations in small provinces are not published due to large estimated coefficients of variation. A preliminary study conducted in cooperation with Statistics Canada investigated estimation procedures for small areas defined by the cross-classification of occupations and provinces using data from a previous Census as auxiliary information. For consistency with published estimates, predictors are desired that preserve the direct estimators of the margins of the two-way table. One method in the Statistics Canada study is based on a nonlinear mixed model for the direct estimators of the proportions. An initial predictor is defined to be a convex combination of the direct estimator and an estimator obtained by raking the Census totals to the direct estimators of the marginal totals. The estimators resulting from the raking operation are called the SPREE estimators and are expected to have smaller variances than the direct estimators. The weight assigned to the direct estimator depends on the relative magnitudes of an estimator of a random model component and an estimator of the sampling variance. The final predictors are defined by raking the initial predictors to the direct estimators of the marginal totals. Estimation of the mean squared error (MSE) of the predictors was not fully developed. This dissertation addresses several issues raised by the procedure discussed above. First, the method above uses SPREE to estimate a fixed expected value. SPREE is unbiased if the Census interactions persist unchanged through time and is efficient if the direct estimators of iv the cell totals are realizations of independent Poisson random variables. A generalization of SPREE that is more efficient under a specified covariance structure is explored. A simulation study shows that predictors constructed under the specified covariance structure can have smaller MSE's than predictors calculated with the direct estimators of the variances. An estimator of the MSE of the initial convex combination of the direct estimator and the estimator of the fixed expected value is derived using Taylor linearizations. The LFS procedure uses a final raking operation to benchmark the predictors. A bootstrap procedure is investigated as a way to account for the effects of raking on the MSE's of the predictors. The procedures are applied to the Canadian Labour Force Survey, but the issues discussed are of general interest because they arise in many small area applications. v TABLE OF CONTENTS ACKNOWLEDGEMENTS . ii LIST OF TABLES . ix LIST OF FIGURES . xix CHAPTER 1. Introduction . 1 1.1 Occupations in the Canadian Labour Force Survey . 2 1.2 Challenges . 3 1.3 Outline . 4 CHAPTER 2. Literature on Small Area Estimation of Counts and Propor- tions . 5 2.1 SPREE and Generalizations . 5 2.2 Generalized Linear Mixed Models . 7 2.3 Exponential Quadratic Variance Function Models . 8 2.4 Semi-parametric Models . 8 2.5 Extensions and Reviews . 9 2.6 Benchmarking . 10 2.7 MSE Estimation . 10 2.8 Estimation of Unknown Sampling Variances . 13 CHAPTER 3. A Model for Vectors of Small Area Proportions . 18 3.1 Fixed Expected Value . 19 3.2 Small Area Effects . 20 3.3 Sampling Errors . 21 vi 3.4 Minimum Mean Squared Error Linear Predictors . 23 3.5 Totals . 24 3.6 Multiple Two-way Tables . 26 3.7 Summary . 26 CHAPTER 4. Procedure . 28 4.1 Estimators of Model Parameters . 28 4.1.1 Estimator of ck and Sampling Variances . 30 4.1.2 Estimator of λo ................................ 33 4.1.3 Estimator of ................................ 34 4.2 Predictors of True Proportions and Totals . 40 4.2.1 Initial Predictors of Proportions . 40 4.2.2 Beale Predictors of Proportions . 41 4.2.3 Benchmarked Predictors . 44 CHAPTER 5. MSE Estimators . 50 5.1 MSE Estimators based on Taylor Linearization . 50 5.1.1 MSE Estimators for Proportions . 50 5.1.2 MSE Estimators for Totals . 56 5.1.3 Estimating the MSE After Raking . 58 5.2 Bootstrap Estimator of MSE . 59 5.2.1 Bootstrap Distributions . 59 5.2.2 Definitions of Bootstrap MSE Estimators . 62 5.2.3 Computation of Bootstrap MSE Estimators . 64 5.2.4 Moments of Bootstrap Distributions of Proportions and Totals . 65 5.2.5 Bootstrap Distributions of Direct Estimators of Sampling Variances . 67 5.2.6 Details on Implementation . 69 5.2.7 Simplifications to Bootstrap MSE Estimator when Working Model for Sampling Variances Holds . 72 5.3 Benefits and Drawbacks of Taylor and Bootstrap MSE Estimators . 73 vii CHAPTER 6. Simulation . 76 6.1 Simulation Models . 76 6.1.1 Parameters for the Simulation . 81 6.2 Details of Computing and Estimation . 85 6.3 Efficiencies of Predictors . 88 6.3.1 Effect of Beale Ratio Estimator of γik on MSE of Predictor . 94 6.3.2 Effect of Raking on MSE of Predictor . 96 6.4 Comparison of Predictors Calculated with Direct Estimators of Sampling Vari- ances to Predictors Calculated with Model Estimators of Sampling Variances . 98 6.5 Comparison of Bootstrap and Taylor MSE Estimators . 108 6.5.1 SRS, 0.003 . 112 6.5.2 SRS, = 0:02 . 118 6.5.3 Two Stage, 0.003 . 124 6.5.4 Two Stage, 0.02 . 129 6.5.5 Estimators of Leading Terms . 133 6.5.6 Effect of Estimating Parameters . 138 6.5.7 Raking Effect . 140 6.5.8 Summary . 144 6.6 Empirical Properties of Normal Theory Prediction Intervals . 147 6.6.1 Empirical Coverages of Normal Theory 95% Confidence Intervals . 147 6.6.2 Interval Widths . 151 6.6.3 Summary . 153 6.7 Augmented Model Predictors . 154 CHAPTER 7. Canadian Labour Force Survey . 158 7.1 National Occupational Classification . 158 7.2 Census and LFS Data . 159 7.3 Estimation and Prediction for Canadian LFS . 161 7.4 Sampling Variances . 165 viii 7.5 Estimates and Predictions for A1 . 185 7.5.1 Model Estimates for A1 . 188 7.5.2 Predictions for A1 . 191 7.5.3 Comparison of MSE's and CV's of Predictors to MSE's and CV's of Direct Estimators for A1 . 196 7.5.4 Model Assessment for A1 . 200 7.6 Estimates and Predictions for E0 . 203 7.6.1 Model Estimates for E0 . 206 7.6.2 Predictions for E0 . 208 7.6.3 Comparison of MSE's and CV's of Predictors to MSE's and CV's of Direct Estimators for E0 . 212 7.6.4 Model Assessment for E0 . 214 7.7 Summary . 216 CHAPTER 8. Summary and Future Study . 219 8.1 Summary . 219 8.2 Future Study . 225 APPENDIX 1 Linear approximation to the operation that converts totals to proportions . 227 APPENDIX 2 Justification of the Linear Approximation for λb ........ 228 APPENDIX 3 Initial Estimator of λo ....................... 231 APPENDIX 4 Distortion of Covariances in Bootstrap Data Generating Procedure . 233 APPENDIX 5 Distortion of Covariances in Bootstrap Data Generating Procedure . 243 BIBLIOGRAPHY . ..