Small Area Estimation I
Total Page:16
File Type:pdf, Size:1020Kb
Small area estimation I Monica Pratesi and Caterina Giusti Department of Economics and Management, University of Pisa 1st EMOS Spring School Trier, Pisa, Manchester, Luxembourg, 23-27 March 2015 M. Pratesi, C. Giusti Small area estimation I Structure of the Presentation 1 Rising interest in measuring progress at a local level 2 An overview of SAE methods 3 Expansion and GREG estimators 4 Empirical Best Linear Unbiased Predictor 5 M-Quantile Approach M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Part I A short introduction to the small area estimation problem M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Demand for statistics at local level During the last decade there has been a rising demand for (objective and subjective) progress and well-being indicators These measures play a central role for policy makers, to plan and to verify the effectiveness of their policies Some examples of indicators: Poverty indicators: At risk of poverty rate, Household equivalized income, etc. Labour market indicators: Unemployment rate, Satisfaction with the job, etc. Health indicators: Average lifespan, Share of population with dangerous behaviors (obesity, smoking, etc.) To be informative and effective, these indicators should be chosen at the appropriate level of disaggregation Indicators can be disaggregated along various dimensions, including geographic areas, demographic groups, income/consumption groups and social groups M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Demand for statistics at local level To gather data to compute the indicators of interest for a given population there are two main solutions: Censuses Sample Surveys Sample surveys have been recognized as cost-effectiveness means of obtaining information on wide-ranging topics of interest at frequent intervals over time Indeed, available data to measure progress and well-being in Europe come mainly from sample surveys, such as the Survey on Income and Living Conditions (EU-SILC) M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Demand for statistics at local level Data coming from sample surveys can be used to compute accurate direct estimates only for large domains Direct estimation uses only domain-specific data: 1 suffer from low precision when the domain sample size is small 2 it is not applicable with zero sample sizes For example, EU-SILC data can be used to produce accurate direct estimates only at the NUTS* 2 level (that is, regional level in Italy) * The geocode NUTS refers to the EU Nomenclature of Territorial Units for Statistics M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Introduction to Small Area Estimation To satisfy the increasing demand of statistical estimates on progress and well-being referring to smaller domains (LAU 1 and LAU 2 levels, that is provinces and municipalities in Italy) using sample survey data, there is the need to resort to small area methodologies Small area methodologies try to fill the gap between official statistics and local request of data What is a small area? What is small area estimation? Small Area Estimation (SAE) is concerned with the development of statistical procedures for producing efficient (precise) estimates for small areas, that is for domains with small or zero sample sizes. Domains are defined by the cross-classification of geographical districts by social/economic/demographic characteristics. The target is the estimation of a parameter (average/percentile/proportion/rate) and the estimation of the corresponding prediction error M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Introduction to Small Area Estimation: Target parameters The estimates of interest in the small areas are usually means or totals. Examples: Mean individual gross income in a set of areas Total crop production in a set of areas However, also other estimates can be of interest. Examples: The quantiles of the household (equivalised) income The proportion of households with income below the poverty line M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Introduction to Small Area Estimation: Example Target population: households who live in an Italian Region Variable of interest: Income or other poverty measures Survey sample: EUSILC (European Union Statistics on Income and Living Conditions), designed to obtain reliable estimate at Regional level in Italy planned design domains: Regions unplanned design domains: e.g. Provinces, Municipalities EUSILC sample size in Tuscany: 1751 households Pisa province 158 households ! need SAE Grosseto province 70 households ! need SAE M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Introduction to Small Area Estimation: Example US sample sizes with an equal probability of selection method sample of 10,000 persons State 1994 Population (thousands) Sample size California 31,431 1207 Texas 18,378 706 New York 18,169 698 . DC 570 22 Wyoming 476 18 Suppose to measure customer satisfaction for a government service: California 24.86% ! leads to a confidence interval of 22.4%-27.3% (reliable) Wyoming 33.33% ! leads to a confidence interval of 10.9%-55.7% (unreliable ! need SAE) M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods SAE: data requirements Survey data: available for the target variable y and for the auxiliary variable x, related to y Census/Administrative data: available for x but not for y SAE in 3 steps 1 Use survey data to estimate models that link y to x 2 Combine the estimated model parameters with x for out of sample units, to form predictions 3 Use these predictions to estimate the target parameters M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods SAE: classifications There are several classifications of SAE methods Some of these classifications cross each other Direct vs Indirect methods Direct methods use only domain-specific data Indirect methods borrow information from all the data M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods SAE: classifications Design-based vs Model-based methods Design-based (Model-assisted) methods Direct estimation Can allow for the use of models (model-assisted) Inference is under the randomization distribution Model-based methods Borrow strength by using a model Estimation using frequentist or Bayesian approaches Inference under the model is conditional on the selected sample M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods SAE: classifications Area-level vs Unit level methods Area-level models relate small area direct estimators to area specific covariates. Such models are necessary if unit (or element) level data are not available. Unit level models that relate the unit values of a study variable to unit-specific covariates. M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods How a SAE Unit Level Model Works M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods A classification of the Small Area Estimation Methods M. Pratesi, C. Giusti Small area estimation I Rising interest in measuring progress at a local level An overview of SAE methods Focus of the lesson We will focus on model-assisted and model-based methods For the model-based estimators we will consider two alternative approaches to SAE: small area estimation using multilevel (mixed/random effects) models ! unit and area-level approaches small area estimation based on M-quantile models ! unit level approach We will present some examples based on true data We will present R codes to obtained the estimates of the examples M. Pratesi, C. Giusti Small area estimation I Expansion and GREG estimators Empirical Best Linear Unbiased Predictor M-Quantile Part II Classic and new approaches to model-based Small Area Estimation M. Pratesi, C. Giusti Small area estimation I Expansion and GREG estimators Empirical Best Linear Unbiased Predictor M-Quantile Definitions Design-based estimation: the main focus is on the design unbiasedness. Estimators are unbiased with respect to the randomization that generates survey data Finite population Ω = 1;:::; N y: variable of interest, with yi value of the i-th unit of the finite population P ¯ Statistics of interest: e.g. total, Y = Ω yi or mean, Y = Y =N Sample s = 1;:::; n p(s): probability of selecting the sample s from population Ω. p(s) depends on know design variables such as stratum indicator and size measures of clusters M. Pratesi, C. Giusti Small area estimation I Expansion and GREG estimators Empirical Best Linear Unbiased Predictor M-Quantile Definitions Bias of an estimator θ^ is defined as E[θ^ − θ] Variance of an estimator θ^ is defined as E[(θ^ − E[θ^])2] Mean Squared Error of an estimator θ^ is: E[(θ^− θ)2] = V [θ^] + B[θ^]2 Design bias: Bias(Y^) = Ep[Y^] − Y 2 Design variance: V (Y^) = Ep[(Y^ − y) ] Design-based properties P 1 Design-unbiasedness: Ep[Y^] = p(s)Y^s = Y 2 Design-consistency: Y^ ! Y in probability M. Pratesi, C. Giusti Small area estimation I Expansion and GREG estimators Empirical Best Linear Unbiased Predictor M-Quantile Estimation of Means: Expansion Estimator Data fyi g; i 2 s Expansion estimator for the mean: P w y ¯^ i2s i i Y = P i2s wi −1 wi = πi , the basic design weight πi is the probability of selecting the unit i in sample s Remark: weights wi are independent from yi M.