Sampling and Evaluation
Total Page:16
File Type:pdf, Size:1020Kb
Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance Aiko Hattori Suggested citation: Lance, P. and A. Hattori. (2016). Sampling and evaluation: A guide to sampling for program impact evaluation. Chapel Hill, North Carolina: MEASURE Evaluation, University of North Carolina. Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance, PhD, MEASURE Evaluation Aiko Hattori, PhD, MEASURE Evaluation ISBN: 978-1-943364-94-7 MEASURE Evaluation This publication was produced with the support of the United States University of North Carolina at Chapel Agency for International Development (USAID) under the terms of Hill MEASURE Evaluation cooperative agreement AID-OAA-L-14-00004. 400 Meadowmont Village Circle, 3rd MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with Floor ICF International; John Snow, Inc.; Management Sciences for Health; Chapel Hill, NC 27517 USA Palladium; and Tulane University. Views expressed are not necessarily Phone: +1 919-445-9350 those of USAID or the United States government. MS-16-112 [email protected] www.measureevaluation.org Dedicated to Anthony G. Turner iii Contents Acknowledgments v 1 Introduction 1 2 Basics of Sample Selection 3 2.1 Basic Selection and Sampling Weights . 5 2.2 Common Sample Selection Extensions and Complications . 58 2.2.1 Multistage Selection . 58 2.2.2 Stratification . 62 2.2.3 The Design Effect, Re-visited . 64 2.2.4 Hard to Find Subpopulations . 64 2.2.5 Large Clusters and Size Sampling . 67 2.3 Complications to Weights . 69 2.3.1 Non-Response Adjustment . 69 2.3.2 Post-Stratification Adjustment . 74 2.3.3 Weight \Normalization" . 76 2.3.4 Weight \Trimming" . 81 2.3.5 Multiple and Overlapping Frames . 82 2.4 Spillover . 86 3 Sample Size Guesstimation 98 3.1 The Classic Sample Size Estimator . 100 3.1.1 Precision . 102 3.1.2 Basic Hypothesis Testing . 114 3.1.3 Translating Sample Size into Primary Sampling Units . 150 3.1.4 On Guesstimation . 156 3.1.5 Multiple Sampling Goals . 157 3.1.6 Finite Populations . 160 3.2 Randomized Controlled Trials . 168 3.3 Selection on Observables . 183 3.3.1 Regression . 185 iv CONTENTS 3.3.2 Matching . 251 3.4 Within Models . 257 3.5 Instrumental Variables . 280 3.5.1 The Basics . 280 3.5.2 Sample Size Estimation . 284 3.5.3 Randomized Controlled Trials and Partial Experiments . 293 4 Conclusion 297 References 299 v Acknowledgments The authors of this manuscript have benefited from the comments, insights, and support of our colleagues, friends, and families. We are grateful to Paul Brodish and Lisa Parker for their extensive review of the manuscript at various stages. We also would like to acknowledge Natasha Mack, Debbie McGill, Elizabeth T. Robinson, and Kathy Doherty for their guidance with editing of the manuscript. 1 Chapter 1 Introduction Program evaluation, or impact evaluation, is a way to get an accurate understanding of the extent to which a health program causes changes in the outcomes it aims to improve. Program impact studies are designed to tell us the extent to which a population's exposure to or participation in a program altered an outcome, compared to what would have happened in the absence of the program. Understanding whether a program produces intended changes allows society to focus scarce resources on those programs that most efficiently and effectively improve people's welfare and health. The usual objective in program impact evaluation is to learn about how a population of in- terest is affected by the program. Programs are typically implemented in geographic areas where populations are large and beyond our resources to observe in their entirety. Therefore, we have to sample. Sampling is the process of selecting a set of observations from a population to estimate a chosen parameter | program impact, for example | for that population. This manual explores the challenges of sampling for program impact evaluations | how to obtain a sample that is reliable for estimating impact of a program and how to obtain a sample that accurately reflects the population of interest. There are two core challenges in sampling. First, one must select a sample that either intrinsically reflects the mix of types of individuals in the population, or one that weights (mathematically adjusts) the sample so that the mix of individuals reflects the mix in the population of interest. Impact studies usually require sampling both program participants and nonparticipants. The second challenge is to select a sample of sufficient size to learn whatever it is that one wishes to know about impact. Larger samples contain more information and, therefore, allow us to learn more (for instance, in general, larger samples allow us to detect smaller and smaller levels of program impact). The manual is divided into two sections: (1) basic sample selection and weighting and (2) sample size estimation. We anticipate that readers might get the most utility and comprehensive under- standing from reading entire chapters rather than trying to cherry-pick portions of the discussions within them, as one might with a reference manual. This manual is more like a textbook. Further, the manual is aimed at practitioners | in particular, those who design and implement 2 CHAPTER 1. INTRODUCTION samples for impact evaluation at their institution. Our discussions assume more than a basic under- standing of sampling and some mathematical skill in applying sampling theory. That said, we are less interested in theory than in its practical application to solve sampling problems encountered in the field. We hope this manual will be a comprehensive and practical resource for that task. 3 Chapter 2 Basics of Sample Selection The motivation for this manual is to discuss issues, topics, and concerns important in the context of sampling for surveys to support impact evaluations (that is, surveys intended to provide estimates of program impact). The basic motivation for a survey for impact evaluation is quite simple: to learn something about program impact within some population. This chapter covers the basics of sample selection. This is a vital topic for impact evaluation. Without understanding it, one cannot know whether a proposed sample selection plan will yield a sample whose composition suitable for achieving the goals of an impact evaluation, particularly in terms of whether it will provide impact for the population of interest. At the same time, considera- tions motivated by the particulars of the selection method, such as the implications of the selection method for impact estimate precision, will inform other areas of the sampling process for impact evaluation surveys, such as sample size estimation. Since this is a chapter in a manual on sampling and evaluation, we are technically only interested in this topic in the context of impact evaluation. Nonetheless, sample selection is a topic that transcends surveys designed to support impact evaluation and indeed is important to virtually all population survey research. Moreover, the basic story is the same for surveys designed narrowly to support impact evaluations and those intended for the more general population survey setting. Because the topics in this chapter, while critical to sampling for impact evaluations, are rather universal, we provide a very general overview not particularly focused on impact evaluation per se. This is a deliberate decision in the interest of clarity; trying to tie this too much or too narrowly to impact evaluation would add complexity to the chapter for no real gain. In particular, we focus in our theoretical discussion and empirical simulation examples on estimation of a simple average (as opposed to a more complex impact evaluation parameter) because that is the simplest, clearest lens through which to understand the implications of sample selection choices. Nonetheless, the lessons from the simple setting of estimating an average apply as well to the context of more complicated estimation goals (it can simply be harder to show it in those cases). 4 CHAPTER 2. BASICS OF SAMPLE SELECTION Figure 2.1. A population of 5,000 households Figure 2.2. A histogram of daily expenditures 2.1. BASIC SELECTION AND SAMPLING WEIGHTS 5 2.1 Basic Selection and Sampling Weights We begin with our most basic goal: to learn something about a population. That something we refer to as a parameter. A parameter could be the rate, average, total, median, etc. of some indicator across a population. For instance, we might wish to learn something about the modern contraceptive prevalence rate across a population, such as all women ages 15{49 in Nigeria. The parameter of interest could also be the difference in two underlying parameters between two subpopulations. This is usually the focus in program impact evaluation, with the two populations (somehow) being participants and non-participants. Whatever the case, the point is that the parameter is some larger truth that we do not know about a population. For now, we describe the parameter of interest simply as Y The overline notation \ " is usually indicative of some kind of an average in statistics and econo- metrics. Although the parameter could be all kinds of things (a median or other quantile, something involving a higher moment, etc.) we use this notation because, in our approach to impact evalua- tion, we generally are thinking about some kind of an average. In particular, we tend to focus on the challenge of estimating average program impact.1 For instance, we typically think of ourselves as trying to estimate a population parameter such as the average treatment effect, average affect of treatment on the treated, etc. Generally, we can think about sampling goals in terms of two questions: • What do we want to learn about Y ? For instance, what parameter do we want to learn about and, more to the point, what do we want to learn about it? • For whom do we want to learn this? The first question will be the focus of the next chapter, which deals with sample size estimation.