INTRODUCTION to SMALL AREA ESTIMATION TECHNIQUES a Practical Guide for National Statistics Offices
Total Page:16
File Type:pdf, Size:1020Kb
INTRODUCTION TO SMALL AREA ESTIMATION TECHNIQUES A Practical Guide for National Statistics Offices MAY 2020 ASIAN DEVELOPMENT BANK INTRODUCTION TO SMALL AREA ESTIMATION TECHNIQUES A Practical Guide for National Statistics Offices MAY 2020 ASIAN DEVELOPMENT BANK Creative Commons Attribution 3.0 IGO license (CC BY 3.0 IGO) © 2020 Asian Development Bank 6 ADB Avenue, Mandaluyong City, 1550 Metro Manila, Philippines Tel +63 2 8632 4444; Fax +63 2 8636 2444 www.adb.org Some rights reserved. Published in 2020. ISBN 978-92-9262-222-0 (print); 978-92-9262-223-7 (electronic); 978-92-9262-224-4 (ebook) Publication Stock No. TIM200160-2 DOI: http://dx.doi.org/10.22617/TIM200160-2 The views expressed in this publication are those of the authors and do not necessarily reflect the views and policies of the Asian Development Bank (ADB) or its Board of Governors or the governments they represent. ADB does not guarantee the accuracy of the data included in this publication and accepts no responsibility for any consequence of their use. The mention of specific companies or products of manufacturers does not imply that they are endorsed or recommended by ADB in preference to others of a similar nature that are not mentioned. By making any designation of or reference to a particular territory or geographic area, or by using the term “country” in this document, ADB does not intend to make any judgments as to the legal or other status of any territory or area. This work is available under the Creative Commons Attribution 3.0 IGO license (CC BY 3.0 IGO) https://creativecommons.org/licenses/by/3.0/igo/. By using the content of this publication, you agree to be bound by the terms of this license. For attribution, translations, adaptations, and permissions, please read the provisions and terms of use at https://www.adb.org/terms-use#openaccess. This CC license does not apply to non-ADB copyright materials in this publication. If the material is attributed to another source, please contact the copyright owner or publisher of that source for permission to reproduce it. ADB cannot be held liable for any claims that arise as a result of your use of the material. Please contact [email protected] if you have questions or comments with respect to content, or if you wish to obtain copyright permission for your intended use that does not fall within these terms, or for permission to use the ADB logo. Corrigenda to ADB publications may be found at http://www.adb.org/publications/corrigenda. Notes: In this publication, “$” refers to United States dollars. ADB recognizes “Korea” as the Republic of Korea. Cover design by Rhommel Rico. CONTENTS Tables, Figures, and Boxes iv Foreword vii Acknowledgments ix Abbreviations x CHAPTER I: INTRODUCTION 1 1.1. What is Small Area Estimation and Why Do We Need It? 2 CHAPTER II: DEVELOPING A SMALL AREA ESTIMATION PLAN 13 2.1 Goal or Purpose of Small Area Estimation 13 2.2 Variable of Interest 14 2.3 Level of Disaggregation and Data Requirements 15 2.4 Approach to Small Area Estimation and Choosing a Specific 16 Technique or Model 2.5 Quality Assessment of the Small Area Estimates 16 2.6 Dissemination Strategy for Presentation of the Small Area Estimates 18 CHAPTER III: DATA MANAGEMENT USING R 19 3.1 Overview of R and RStudio 19 3.2 Fundamentals of R 30 3.3 Data Manipulation Using dplyr and tidyr 39 3.4 Linear Regression in R 43 3.5 R Packages for Small Area Estimation 45 CHAPTER IV: APPROACHES IN SMALL AREA ESTIMATION 47 4.1 Direct Survey Estimation 47 4.2 Small Area Estimation Using Auxiliary Information 49 4.3 Small Area Estimation Using Regression-Based Models 60 CHAPTER V: VISUALIZING SMALL AREA ESTIMATES USING R 81 CHAPTER VI: CONCLUSION 83 APPENDIXES 85 1. Description of R packages Identified and Used in the Illustrations 86 2. Data Files Identified and Used in the Illustrations 87 3. Model Building 89 REFERENCES 95 iv | TABLES, FIGURES, AND BOXES TABLES 3.1 Basic Arithmetic Operators in R 30 3.2 Logistical and Relational Operators in R 30 3.3 R Packages for Importing and Exporting Data Files from 33 Different Applications 3.4 R Commands for Examining Data Set 36 3.5 R Commands for Basic Statistics 37 4.1 Population and Magnitude of Poor Population for Each 50 Municipality in Province X 4.2 Structure of Data Set for Synthetic Estimation 52 4.3 Summary of Different Small Area Estimation Methods 78 FIGURES 1.1 Availability of Disaggregated Data from the Sustainable Development Goals among 6 Asian Development Bank–United Nations Economic and Social Commission for Asia and the Pacific Member Economies 1.2 Recommended Sample Size for Different Levels of Geographic Disaggregation 6 3.1 Graphical User Interface for Downloading R 20 3.2 Installing RStudio 20 3.3 Opening the Fourth Panel of RStudio 21 3.4 Four Main Panels of RStudio 21 3.5 Shortcut Tools of the Editor Window in RStudio 22 3.6 Console Window in RStudio 22 3.7 Files Tab in RStudio 23 3.8 Plots Tab in RStudio 23 3.9 Packages Tab in RStudio 24 3.10 Help Tab in RStudio 24 3.11 Environment Tab in RStudio 25 3.12 History Tab in RStudio 25 3.13 Install Packages Window 26 3.14 Installing Multiple Packages in RStudio 27 3.15 List of Packages in RStudio 27 Tables, Figures, and Boxes | v 3.16 Loading the Package in RStudio 28 3.17 Accessing Help Tab in RStudio 29 3.18 Running Help Command in RStudio 29 3.19 Navigating the Folder for Setting the Working Directory 32 3.20 Setting the Working Directory 32 3.21 Importing Data Sets in Environment Tab 34 4.1 Illustration of Weight Reallocation from Neighboring Subdomains 53 4.2 Small Area Estimation Process 77 BOXES 1.1 Small Area Estimation of Poverty in the Philippines and Thailand 9 2.1 Difference Between Accuracy and Precision in Survey Sampling 17 FOREWORD rom 2000 to 2015, the Millennium Development Goals (MDGs) influenced global development Fstrategies by setting concrete, time-specific, and measurable targets. By 2015, the MDGs had achieved substantial progress in poverty reduction and other areas of socio- economic development. In education and health, for instance, the number of out-of-school children of primary school age and the mortality rate for children aged under 5 years had decreased since 1990. Although data for the MDGs generated intercountry comparisons across various social and economic metrics, the absence of granular data meant that they fell short in showing how disparities within each country differed over time. This offered scarce empirical evidence on which sector of a country’s population advanced or trailed behind in relation to the MDGs, and provided insufficient data to inform the development of appropriate programs for vulnerable segments of the population. To address this concern, the 2030 Sustainable Development Agenda pledged that “no one will be left behind,” and called for more granular data by measuring specific Sustainable Development Goal (SDG) indicators for various clusters of the population (i.e., based on income level, ethnicity, geographic area and other groups relevant to the national context). Many techniques can generate granular-level SDG data, and each strategy requires different levels of accuracy and data specifications. For survey-based estimates, data granularity implies that the survey sufficiently represents samples from each subgroup of the population. However, most national statistics offices (NSOs) in developing nations are resource-constrained and may not be able to conduct large enough surveys to generate reliable estimates for various subgroups of the population. In such cases, small area estimation methodologies can provide more reliable granular level estimates by “borrowing strength” from other data collection vehicles with more comprehensive coverage, thus artificially increasing the survey sample size. This document serves as a step-by-step guide on how to implement basic small area estimation methods and highlights important considerations when executing each technique. Brief discussions of underlying theories and statistical principles are complemented with practical examples to reinforce the readers’ learning process. Due to increasing popularity of usage of R among development statisticians and researchers, software implementation using R is also demonstrated throughout this guide. This guide is intended for staff of NSOs who are compiling granular statistics needed for SDG data monitoring. The users of this guide are expected to have knowledge of basic concepts of regression modeling. We hope this guide will enrich the portfolio of analytical tools available to NSOs and contribute to increased availability of detailed frameworks for the disaggregation of SDG data. Yasuyuki Sawada Chief Economist and Director General Economic Research and Regional Cooperation Department Asian Development Bank ACKNOWLEDGMENTS reparation of Introduction to Small Area Estimation Techniques: A Practical Guide for National Statistics POffices was undertaken by the Statistics and Data Innovation Unit of the Economic Research and Regional Cooperation Department at the Asian Development Bank (ADB) and supported by Knowledge and Support Technical Assistance (KSTA) 9356: Data for Development. Arturo Martinez, Jr. led the publication of this guide under the overall direction of Kaushal Joshi and with technical support from Joseph Bulan, Criselda De Dios, and Iva Sebastian. ADB acknowledges the valuable contribution of Zita Albacea, who prepared the first draft of this guide, and project team members Mildred Addawe, Joseph Bulan, Ron Lester Durante, Jan Arvin Lapuz, Marymell Martillan, Arturo Martinez Jr., and Katrina Miradora, who finalized the guide. We also thank Jose Ramon Albert, Erniel Barrios and Joseph Ryan Lansangan for technical advice, detailed reviews, and relevant documents that served as references in preparing this guide.