Sampling and Evaluation

Total Page:16

File Type:pdf, Size:1020Kb

Sampling and Evaluation Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance Aiko Hattori Suggested citation: Lance, P. and A. Hattori. (2016). Sampling and evaluation: A guide to sampling for program impact evaluation. Chapel Hill, North Carolina: MEASURE Evaluation, University of North Carolina. Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance, PhD, MEASURE Evaluation Aiko Hattori, PhD, MEASURE Evaluation ISBN: 978-1-943364-94-7 MEASURE Evaluation This publication was produced with the support of the United States University of North Carolina at Chapel Agency for International Development (USAID) under the terms of Hill MEASURE Evaluation cooperative agreement AID-OAA-L-14-00004. 400 Meadowmont Village Circle, 3rd MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with Floor ICF International; John Snow, Inc.; Management Sciences for Health; Chapel Hill, NC 27517 USA Palladium; and Tulane University. Views expressed are not necessarily Phone: +1 919-445-9350 those of USAID or the United States government. MS-16-112 [email protected] www.measureevaluation.org Dedicated to Anthony G. Turner iii Contents Acknowledgments v 1 Introduction 1 2 Basics of Sample Selection 3 2.1 Basic Selection and Sampling Weights . 5 2.2 Common Sample Selection Extensions and Complications . 58 2.2.1 Multistage Selection . 58 2.2.2 Stratification . 62 2.2.3 The Design Effect, Re-visited . 64 2.2.4 Hard to Find Subpopulations . 64 2.2.5 Large Clusters and Size Sampling . 67 2.3 Complications to Weights . 69 2.3.1 Non-Response Adjustment . 69 2.3.2 Post-Stratification Adjustment . 74 2.3.3 Weight \Normalization" . 76 2.3.4 Weight \Trimming" . 81 2.3.5 Multiple and Overlapping Frames . 82 2.4 Spillover . 86 3 Sample Size Guesstimation 98 3.1 The Classic Sample Size Estimator . 100 3.1.1 Precision . 102 3.1.2 Basic Hypothesis Testing . 114 3.1.3 Translating Sample Size into Primary Sampling Units . 150 3.1.4 On Guesstimation . 156 3.1.5 Multiple Sampling Goals . 157 3.1.6 Finite Populations . 160 3.2 Randomized Controlled Trials . 168 3.3 Selection on Observables . 183 3.3.1 Regression . 185 iv CONTENTS 3.3.2 Matching . 251 3.4 Within Models . 257 3.5 Instrumental Variables . 280 3.5.1 The Basics . 280 3.5.2 Sample Size Estimation . 284 3.5.3 Randomized Controlled Trials and Partial Experiments . 293 4 Conclusion 297 References 299 v Acknowledgments The authors of this manuscript have benefited from the comments, insights, and support of our colleagues, friends, and families. We are grateful to Paul Brodish and Lisa Parker for their extensive review of the manuscript at various stages. We also would like to acknowledge Natasha Mack, Debbie McGill, Elizabeth T. Robinson, and Kathy Doherty for their guidance with editing of the manuscript. 1 Chapter 1 Introduction Program evaluation, or impact evaluation, is a way to get an accurate understanding of the extent to which a health program causes changes in the outcomes it aims to improve. Program impact studies are designed to tell us the extent to which a population's exposure to or participation in a program altered an outcome, compared to what would have happened in the absence of the program. Understanding whether a program produces intended changes allows society to focus scarce resources on those programs that most efficiently and effectively improve people's welfare and health. The usual objective in program impact evaluation is to learn about how a population of in- terest is affected by the program. Programs are typically implemented in geographic areas where populations are large and beyond our resources to observe in their entirety. Therefore, we have to sample. Sampling is the process of selecting a set of observations from a population to estimate a chosen parameter | program impact, for example | for that population. This manual explores the challenges of sampling for program impact evaluations | how to obtain a sample that is reliable for estimating impact of a program and how to obtain a sample that accurately reflects the population of interest. There are two core challenges in sampling. First, one must select a sample that either intrinsically reflects the mix of types of individuals in the population, or one that weights (mathematically adjusts) the sample so that the mix of individuals reflects the mix in the population of interest. Impact studies usually require sampling both program participants and nonparticipants. The second challenge is to select a sample of sufficient size to learn whatever it is that one wishes to know about impact. Larger samples contain more information and, therefore, allow us to learn more (for instance, in general, larger samples allow us to detect smaller and smaller levels of program impact). The manual is divided into two sections: (1) basic sample selection and weighting and (2) sample size estimation. We anticipate that readers might get the most utility and comprehensive under- standing from reading entire chapters rather than trying to cherry-pick portions of the discussions within them, as one might with a reference manual. This manual is more like a textbook. Further, the manual is aimed at practitioners | in particular, those who design and implement 2 CHAPTER 1. INTRODUCTION samples for impact evaluation at their institution. Our discussions assume more than a basic under- standing of sampling and some mathematical skill in applying sampling theory. That said, we are less interested in theory than in its practical application to solve sampling problems encountered in the field. We hope this manual will be a comprehensive and practical resource for that task. 3 Chapter 2 Basics of Sample Selection The motivation for this manual is to discuss issues, topics, and concerns important in the context of sampling for surveys to support impact evaluations (that is, surveys intended to provide estimates of program impact). The basic motivation for a survey for impact evaluation is quite simple: to learn something about program impact within some population. This chapter covers the basics of sample selection. This is a vital topic for impact evaluation. Without understanding it, one cannot know whether a proposed sample selection plan will yield a sample whose composition suitable for achieving the goals of an impact evaluation, particularly in terms of whether it will provide impact for the population of interest. At the same time, considera- tions motivated by the particulars of the selection method, such as the implications of the selection method for impact estimate precision, will inform other areas of the sampling process for impact evaluation surveys, such as sample size estimation. Since this is a chapter in a manual on sampling and evaluation, we are technically only interested in this topic in the context of impact evaluation. Nonetheless, sample selection is a topic that transcends surveys designed to support impact evaluation and indeed is important to virtually all population survey research. Moreover, the basic story is the same for surveys designed narrowly to support impact evaluations and those intended for the more general population survey setting. Because the topics in this chapter, while critical to sampling for impact evaluations, are rather universal, we provide a very general overview not particularly focused on impact evaluation per se. This is a deliberate decision in the interest of clarity; trying to tie this too much or too narrowly to impact evaluation would add complexity to the chapter for no real gain. In particular, we focus in our theoretical discussion and empirical simulation examples on estimation of a simple average (as opposed to a more complex impact evaluation parameter) because that is the simplest, clearest lens through which to understand the implications of sample selection choices. Nonetheless, the lessons from the simple setting of estimating an average apply as well to the context of more complicated estimation goals (it can simply be harder to show it in those cases). 4 CHAPTER 2. BASICS OF SAMPLE SELECTION Figure 2.1. A population of 5,000 households Figure 2.2. A histogram of daily expenditures 2.1. BASIC SELECTION AND SAMPLING WEIGHTS 5 2.1 Basic Selection and Sampling Weights We begin with our most basic goal: to learn something about a population. That something we refer to as a parameter. A parameter could be the rate, average, total, median, etc. of some indicator across a population. For instance, we might wish to learn something about the modern contraceptive prevalence rate across a population, such as all women ages 15{49 in Nigeria. The parameter of interest could also be the difference in two underlying parameters between two subpopulations. This is usually the focus in program impact evaluation, with the two populations (somehow) being participants and non-participants. Whatever the case, the point is that the parameter is some larger truth that we do not know about a population. For now, we describe the parameter of interest simply as Y The overline notation \ " is usually indicative of some kind of an average in statistics and econo- metrics. Although the parameter could be all kinds of things (a median or other quantile, something involving a higher moment, etc.) we use this notation because, in our approach to impact evalua- tion, we generally are thinking about some kind of an average. In particular, we tend to focus on the challenge of estimating average program impact.1 For instance, we typically think of ourselves as trying to estimate a population parameter such as the average treatment effect, average affect of treatment on the treated, etc. Generally, we can think about sampling goals in terms of two questions: • What do we want to learn about Y ? For instance, what parameter do we want to learn about and, more to the point, what do we want to learn about it? • For whom do we want to learn this? The first question will be the focus of the next chapter, which deals with sample size estimation.
Recommended publications
  • Survey Sampling
    Bayesian SAE using Complex Survey Data Lecture 5A: Survey Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 119 Outline Overview Design-Based Inference Simple Random Sampling Stratified Simple Random Sampling Cluster Sampling Multistage Sampling Discussion Technical Appendix: Simple Random Sampling Technical Appendix: Stratified SRS Technical Appendix: Cluster Sampling Technical Appendix: Lonely PSUs 2 / 119 Overview 3 / 119 Outline Many national surveys employ stratified cluster sampling, also known as multistage sampling, so that’s where we’d like to get to. In this lecture we will discuss: I Simple Random Sampling (SRS). I Stratified SRS. I Cluster sampling. I Multistage sampling. 4 / 119 Main texts I Lohr, S.L. (2010). Sampling Design and Analysis, Second Edition. Brooks/Cole Cengage Learning. Very well-written and clear mix of theory and practice. I Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R, Wiley. Written around the R survey package. Great if you already know a lot about survey sampling. I Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys. Wiley. Well written but not as comprehensive as Lohr. I Sarndal,¨ Swensson and Wretman (1992). Model Assisted Survey Sampling. Springer. Excellent on the theory though steep learning curve and hard to dip into if not familiar with the notation. Also, anti- model-based approaches. 5 / 119 The problem We have a question concerning variables in a well-defined finite population (e.g., 18+ population in Washington State). What is required of a sample plan? We want: I Accurate answer to the question (estimate). I Good estimate of the uncertainty of the estimate (e.g., variance).
    [Show full text]
  • Statistical Theory and Methodology for the Analysis of Microbial Compositions, with Applications
    Statistical Theory and Methodology for the Analysis of Microbial Compositions, with Applications by Huang Lin BS, Xiamen University, China, 2015 Submitted to the Graduate Faculty of the Graduate School of Public Health in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2020 UNIVERSITY OF PITTSBURGH GRADUATE SCHOOL OF PUBLIC HEALTH This dissertation was presented by Huang Lin It was defended on April 2nd 2020 and approved by Shyamal Das Peddada, PhD, Professor and Chair, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Jeanine Buchanich, PhD, Research Associate Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Ying Ding, PhD, Associate Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Matthew Rogers, PhD, Research Assistant Professor, Department of Surgery, UPMC Children's Hospital of Pittsburgh Hong Wang, PhD, Research Assistant Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Dissertation Director: Shyamal Das Peddada, PhD, Professor and Chair, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh ii Copyright c by Huang Lin 2020 iii Statistical Theory and Methodology for the Analysis of Microbial Compositions, with Applications Huang Lin, PhD University of Pittsburgh, 2020 Abstract Increasingly researchers are finding associations between the microbiome and human diseases such as obesity, inflammatory bowel diseases, HIV, and so on. Determining what microbes are significantly different between conditions, known as differential abundance (DA) analysis, and depicting the dependence structure among them, are two of the most challeng- ing and critical problems that have received considerable interest.
    [Show full text]
  • Analytic Inference in Finite Population Framework Via Resampling Arxiv
    Analytic inference in finite population framework via resampling Pier Luigi Conti Alberto Di Iorio Abstract The aim of this paper is to provide a resampling technique that allows us to make inference on superpopulation parameters in finite population setting. Under complex sampling designs, it is often difficult to obtain explicit results about su- perpopulation parameters of interest, especially in terms of confidence intervals and test-statistics. Computer intensive procedures, such as resampling, allow us to avoid this problem. To reach the above goal, asymptotic results about empirical processes in finite population framework are first obtained. Then, a resampling procedure is proposed, and justified via asymptotic considerations. Finally, the results obtained are applied to different inferential problems and a simulation study is performed to test the goodness of our proposal. Keywords: Resampling, finite populations, H´ajekestimator, empirical process, statistical functionals. arXiv:1809.08035v1 [stat.ME] 21 Sep 2018 1 Introduction The use of superpopulation models in survey sampling has a long history, going back (at least) to [8], where the limits of assuming the population characteristics as fixed, especially in economic and social studies, are stressed. As clearly appears, for instance, from [30] and [26], there are basically two types of inference in the finite populations setting. The first one is descriptive or enumerative inference, namely inference about finite population parameters. This kind of inference is a static \picture" on the current state of a population, and does not take into account the mechanism generating the characters of interest of the population itself. The second one is analytic inference, and consists in inference on superpopulation parameters.
    [Show full text]
  • IBM SPSS Complex Samples Business Analytics
    IBM Software IBM SPSS Complex Samples Business Analytics IBM SPSS Complex Samples Correctly compute complex samples statistics When you conduct sample surveys, use a statistics package dedicated to Highlights producing correct estimates for complex sample data. IBM® SPSS® Complex Samples provides specialized statistics that enable you to • Increase the precision of your sample or correctly and easily compute statistics and their standard errors from ensure a representative sample with stratified sampling. complex sample designs. You can apply it to: • Select groups of sampling units with • Survey research – Obtain descriptive and inferential statistics for clustered sampling. survey data. • Select an initial sample, then create • Market research – Analyze customer satisfaction data. a second-stage sample with multistage • Health research – Analyze large public-use datasets on public health sampling. topics such as health and nutrition or alcohol use and traffic fatalities. • Social science – Conduct secondary research on public survey datasets. • Public opinion research – Characterize attitudes on policy issues. SPSS Complex Samples provides you with everything you need for working with complex samples. It includes: • An intuitive Sampling Wizard that guides you step by step through the process of designing a scheme and drawing a sample. • An easy-to-use Analysis Preparation Wizard to help prepare public-use datasets that have been sampled, such as the National Health Inventory Survey data from the Centers for Disease Control and Prevention
    [Show full text]
  • Sampling Handout
    SAMPLING SIMPLE RANDOM SAMPLING – A sample in which all population members have the same probability of being selected and the selection of each member is independent of the selection of all other members. SIMPLE RANDOM SAMPLING (RANDOM SAMPLING): Selecting a group of subjects (a sample) for study from a larger group (population) so that each individual (or other unit of analysis) is chosen entirely by chance. When used without qualifications (such as stratified random sampling), random sampling means “simple random sampling.” Also sometimes called “equal probability sample,” because every member of the population has an equal probability (chance) of being included in the sample. A random sample is not the same thing as a haphazard or accidental sample. Using random sampling reduces the likelihood of bias. SYSTEMATIC SAMPLING – A procedure for selecting a probability sample in which every kth member of the population is selected and in which 1/k is the sampling fraction. SYSTEMATIC SAMPLING: A sample obtained by taking every ”nth” subject or case from a list containing the total population (or sampling frame). The size of the n is calculated by dividing the desired sample size into the population size. For example, if you wanted to draw a systematic sample of 1,000 individuals from a telephone directory containing 100,000 names, you would divide 1,000 into 100,000 to get 100; hence, you would select every 100th name from the directory. You would start with a randomly selected number between 1 and 100, say 47, and then select the 47th name, the 147th, the 247th, the 347th, and so on.
    [Show full text]
  • Target Population” – Do Not Use “Sample Population.”
    NCES Glossary Analysis of Variance: Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to test or analyze the differences among group means in a sample. This technique is operated by modeling the value of the dependent variable Y as a function of an overall mean “µ”, related independent variables X1…Xn, and a residual “e” that is an independent normally distributed random variable. Bias (due to nonresponse): The difference that occurs when respondents differ as a group from non-respondents on a characteristic being studied. Bias (of an estimate): The difference between the expected value of a sample estimate and the corresponding true value for the population. Bootstrap: See Replication techniques. CAPI: Computer Assisted Personal Interviewing enables data collection staff to use portable microcomputers to administer a data collection form while viewing the form on the computer screen. As responses are entered directly into the computer, they are used to guide the interview and are automatically checked for specified range, format, and consistency edits. CATI: Computer Assisted Telephone Interviewing uses a computer system that allows a telephone interviewer to administer a data collection form over the phone while viewing the form on a computer screen. As the interviewer enters responses directly into the computer, the responses are used to guide the interview and are automatically checked for specified range, format, and consistency edits. Cluster: a group of similar people, things or objects positioned or occurring closely together. Cluster Design: Cluster Design refers to a type of sampling method in which the researcher divides the population into separate groups, of similarities - called clusters.
    [Show full text]
  • Sampling Techniques Third Edition
    Sampling Techniques third edition WILLIAM G. COCHRAN Professor of Statistics, Emeritus Harvard University JOHN WILEY & SONS New York• Chichester• Brisbane• Toronto• Singapore Copyright © 1977, by John Wiley & Sons, Inc. All nghts reserved. Published simultaneously in Canada. Reproducuon or tran~lation of any pan of this work beyond that permined by Sections 107 or 108 of the 1976 United Slates Copy­ right Act wuhout the perm1!>sion of the copyright owner 1s unlaw­ ful. Requests for permission or funher mformat1on should be addressed to the Pcrm1ss1ons Depanmcnt, John Wiley & Sons. Inc Library of Congress Cataloging In Puhllcatlon Data: Cochran, William Gemmell, 1909- Sampling techniques. (Wiley series in probability and mathematical statistic!>) Includes bibliographical references and index. 1. Sampling (Statistics) I. Title. QA276.6.C6 1977 001.4'222 77-728 ISBN 0-471-16240-X Printed in the United States of America 40 39 38 37 36 to Betty Preface As did the previous editions, this textbook presents a comprehensive account of sampling theory as it has been developed for use in sample surveys. It cantains illustrations to show how the theory is applied in practice, and exercises to be worked by the student. The book will be useful both as a text for a course on sample surveys in which the major emphasis is on theory and for individual reading by the student. The minimum mathematical equipment necessary to follow the great bulk of the material is a familiarity with algebra, especially relatively complicated algeb­ raic expressions, plus a·knowledge of probability for finite sample spaces, includ­ ing combinatorial probabilities.
    [Show full text]
  • Finite Population Correction Methods
    Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1 Modified Resample Size. .5 4.2 Population and Superpopulation Bootstrap . .6 5 Simulation Study 8 6 Results and Discussion 9 6.1 Results: Coverage Rate and Mean Width . .9 6.2 Discussion . 14 7 Improvement and Future Work 15 8 References 15 1 Introduction One application of inferential statistics is to use a simple random sample of size n from a population to learn about the characteristics of such population. A very common inference is that of the population mean. The sample mean is typically used as a point estimate for the population mean. The uncertainty in using a point estimate is addressed by means of confidence intervals. Con- fidence intervals provide us with a range of values for the unknown population mean along with the precision of the method. For the parametric approach, the central limit theorem allows us to construct t-based confidence intervals for large sample sizes or symmetric populations. Over the years, nonparametric computer-intensive bootstrap methods have become popular for constructing confidence intervals as well. 1 The central limit, the standard error of the sample mean and traditional bootstrap methods are based on the principle that samples are selected with replacement or that sampling is done without replacement from an infinite pop- ulation. In most research surveys, however, sampling is done from a finite population of size N. When we sample from an infinite population or sample with replacement; then selecting one unit does not affect the probability of se- lecting the same or another unit.
    [Show full text]
  • Multilevel Modeling of Complex Survey Data
    Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics 2007 West Coast Stata Users Group Meeting Marina del Rey, October 2007 GLLAMM – p.1 Outline Model-based and design based inference Multilevel models and pseudolikelihood Pseudo maximum likelihood estimation for U.S. PISA 2000 data Scaling of level-1 weights Simulation study Conclusions GLLAMM – p.2 Multistage sampling: U.S. PISA 2000 data Program for International Student Assessment (PISA): Assess and compare 15 year old students’ reading, math, etc. Three-stage survey with different probabilities of selection Stage 1: Geographic areas k sampled Stage 2: Schools j =1, . , n(2) sampled with different probabilities πj (taking into account school non-response) (1) Stage 3: Students i=1, . , nj sampled from school j, with conditional probabilities πi|j Probability that student i from school j is sampled: πij = πi|jπj GLLAMM – p.3 Model-based and design-based inference Model-based inference: Target of inference is parameter β in infinite population (parameter of data generating mechanism or statistical model) called superpopulation parameter Consistent estimator (assuming simple random sampling) such as maximum likelihood estimator (MLE) yields estimate β Design-based inference: Target of inference is statistic in finiteb population (FP), e.g., mean score yFP of all 15-year olds in LA Student who had a πij =1/5 chance of being sampled represents wij =1/πij =5 similar students in finite population Estimate of finite population mean (Horvitz-Thompson): FP 1 y = w y w ij ij ij ij Xij b P Similar for proportions, totals, etc.
    [Show full text]
  • 06 Sampling Plan For
    The Journal of Industrial Statistics (2014), 3 (1), 120 - 139 120 A Resource Based Sampling Plan for ASI B. B. Singh1, National Sample Survey Office, FOD, New Delhi, India Abstract Annual Survey of Industries (ASI), despite having the mandates of self-compilation of returns by the units selected for the survey, most of the units require the support and expertise of the field functionaries for compilation. Responsibility of the survey for central sample including the census units rests with the Field Operations Division (FOD) of National Sample Survey Office (NSSO). The new sampling plan for the ASI envisages uniform sampling fraction for the sample units for the strata at State X district X sector X 4 digit NIC level, irrespective of the number of population units in each of the strata. Many strata have comparatively smaller number of population units requiring larger sampling fraction for better precision of estimates. On the other hand, a sizeable number of Regional Offices having the jurisdiction over a number of districts usually gets large allocation of sample units in individual strata beyond their managerial capacity with respect to availability of field functionaries and the work load, leading to increased non sampling errors. A plan based on varying sampling fraction ensuring a certain level of significance may result less number of units in these regions however still ensuring the estimates at desired precision. The sampling fraction in other strata having less number of population units could be increased so as to enhance the precision of the estimates in those strata. The latest ASI frames of units have been studied and a suitable sampling fraction has been suggested in the paper.
    [Show full text]
  • Chapter 4 Stratified Sampling
    Chapter 4 Stratified Sampling An important objective in any estimation problem is to obtain an estimator of a population parameter which can take care of the salient features of the population. If the population is homogeneous with respect to the characteristic under study, then the method of simple random sampling will yield a homogeneous sample, and in turn, the sample mean will serve as a good estimator of the population mean. Thus, if the population is homogeneous with respect to the characteristic under study, then the sample drawn through simple random sampling is expected to provide a representative sample. Moreover, the variance of the sample mean not only depends on the sample size and sampling fraction but also on the population variance. In order to increase the precision of an estimator, we need to use a sampling scheme which can reduce the heterogeneity in the population. If the population is heterogeneous with respect to the characteristic under study, then one such sampling procedure is a stratified sampling. The basic idea behind the stratified sampling is to divide the whole heterogeneous population into smaller groups or subpopulations, such that the sampling units are homogeneous with respect to the characteristic under study within the subpopulation and heterogeneous with respect to the characteristic under study between/among the subpopulations. Such subpopulations are termed as strata. Treat each subpopulation as a separate population and draw a sample by SRS from each stratum. [Note: ‘Stratum’ is singular and ‘strata’ is plural]. Example: In order to find the average height of the students in a school of class 1 to class 12, the height varies a lot as the students in class 1 are of age around 6 years, and students in class 10 are of age around 16 years.
    [Show full text]
  • Piazza Chapter on "Fundamentals of Applied Sampling"
    1 Chapter 5 Fundamentals of Applied Sampling Thomas Piazza 5.1 The Basic Idea of Sampling Survey sampling is really quite remarkable. In research we often want to know certain characteristics of a large population, but we are almost never able to do a complete census of it. So we draw a sample—a subset of the population—and conduct research on that relatively small subset. Then we generalize the results, with an allowance for sampling error, to the entire population from which the sample was selected. How can this be justified? The capacity to generalize sample results to an entire population is not inherent in just any sample. If we interview people in a “convenience” sample—those passing by on the street, for example—we cannot be confident that a census of the population would yield similar results. To have confidence in generalizing sample results to the whole population requires a “probability sample” of the population. This chapter presents a relatively non-technical explanation of how to draw a probability sample. Key Principles of Probability Sampling When planning to draw a sample, we must do several basic things: 1. Define carefully the population to be surveyed. Do we want to generalize the sample result to a particular city? Or to an entire nation? Or to members of a professional group or some other organization? It is important to be clear about our intentions. Often it may not be realistic to attempt to select a survey sample from the whole population we ideally would like to study. In that case it is useful to distinguish between the entire population of interest (e.g., all adults in the U.S.) and the population we will actually attempt to survey (e.g., adults living in households in the continental U.S., with a landline telephone in the home).
    [Show full text]