Sampling Methods for Web and E-Mail Surveys
Total Page:16
File Type:pdf, Size:1020Kb
11 Sampling Methods for Web and E-mail Surveys Ronald D. Fricker, Jr ABSTRACT by postal mail and telephone, which in This chapter is a comprehensive overview of the aggregate we refer to as ‘traditional’ sampling methods for web and e-mail (‘Internet- surveys. based’) surveys. It first reviews the various sampling The chapter begins with a general overview methods – both probability and non-probability – of sampling. Since there are many fine and then examines their applicability to Internet- textbooks on the mechanics and mathematics based surveys. Issues related to Internet-based survey sampling are discussed, including difficul- of sampling, we restrict our discussion to ties assembling sampling frames for probability the main ideas that are necessary to ground sampling, coverage issues, and nonresponse and our discussion on sampling for Internet-based selection bias. The implications of the various survey surveys. Readers already well versed in the mode choices on statistical inference and analyses fundamentals of survey sampling may wish are summarized. to proceed directly to the section on Sampling Methods for Internet-based Surveys. INTRODUCTION In the context of conducting surveys or WHY SAMPLE? collecting data, sampling is the selection of a subset of a larger population to survey. Surveys are conducted to gather information This chapter focuses on sampling methods about a population. Sometimes the survey is for web and e-mail surveys, which taken conducted as a census, where the goal is to together we call ‘Internet-based’ surveys. survey every unit in the population. However, In our discussion we will frequently com- it is frequently impractical or impossible to pare sampling methods for Internet-based survey an entire population, perhaps owing surveys to various types of non-Internet- to either cost constraints or some other based surveys, such as those conducted practical constraint, such as that it may not 196 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS be possible to identify all the members of the The advantages of lower cost and less population. effort are obvious: keeping all else constant, An alternative to conducting a census is reducing the number of surveys should cost to select a sample from the population and less and take less effort to field and analyze. survey only those sampled units. As shown However, that a survey based on a sample in Figure 11.1, the idea is to draw a sample rather than a census can give better response from the population and use data collected rates and greater accuracy is less obvious. from the sample to infer information about Yet, greater survey accuracy can result when the entire population. To conduct statistical the sampling error is more than offset by inference (i.e., to be able to make quantitative a decrease in nonresponse and other biases, statements about the unobserved population perhaps due to increased response rates. That statistic), the sample must be drawn in such a is, for a fixed level of effort (or funding), a fashion that one can both calculate appropriate sample allows the surveying organization to sample statistics and estimate their standard put more effort into maximizing responses errors. To do this, as will be discussed in from those surveyed, perhaps via more effort this chapter, one must use a probability-based invested in survey design and pre-testing, sampling methodology. or perhaps via more detailed non-response A survey administered to a sample can follow-up. have a number of advantages over a census, What does all of this have to do with including: Internet-based surveys? Before the Internet, large surveys were generally expensive to • lower cost administer and hence survey professionals • less effort to administer gave careful thought to how to best conduct • better response rates a survey in order to maximize information • greater accuracy. accuracy while minimizing costs. However, Population sample Unobserved population inference Sample statistic statistic Figure 11.1 An illustration of sampling. When it is impossible or infeasible to observe a population statistic directly, data from a sample appropriately drawn from the population can be used to infer information about the population SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 197 as illustrated in Figure 11.2, the Internet Conducting surveys, as in all forms of data now provides easy access to a plethora collection, requires making compromises. of inexpensive survey software, as well as Specifically, there are almost always trade- to millions of potential survey respondents, offs to be made between the amount of data and it has lowered other costs and barriers that can be collected and the accuracy of to surveying. While this is good news for the data collected. Hence, it is critical for survey researchers, these same factors have researchers to have a firm grasp of the trade- also facilitated a proliferation of bad survey offs they implicitly or explicitly make when research practice. choosing a sampling method for collecting For example, in an Internet-based survey their data. the marginal cost of collecting additional data can be virtually zero. At first blush, this seems to be an attractive argument in favor AN OVERVIEW OF SAMPLING of attempting to conduct censuses, or for sim- ply surveying large numbers of individuals There are many ways to draw samples without regard to how the individuals are from a population – and there are also recruited into the sample. And, in fact, these many ways that sampling can go awry. approaches are being used more frequently We intuitively think of a good sample as with Internet-based surveys, without much one that is representative of the population thought being given to alternative sampling from which the sample has been drawn. By strategies or to the potential impact such ‘representative’ we do not necessarily mean choices have on the accuracy of the survey the sample matches the population in terms results. The result is a proliferation of poorly of observable characteristics, but rather that conducted ‘censuses’ and surveys based on the results from the data we collect from large convenience samples that are likely to the sample are consistent with the results we yield less accurate information than a well- would have obtained if we had collected data conducted survey of a smaller sample. on the entire population. Figure 11.2 Banners for various Internet survey software (accessed January 2007) 198 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS Of course, the phrase ‘consistent with’ The survey sample then consists of those is vague and, if this was an exposition of members of the sampling frame that were the mathematics of sampling, would require chosen to be surveyed, and coverage error is a precise definition. However, we will not the difference between the frame population cover the details of survey sampling here.1 and the population of inference. Rather, in this section we will describe the The two most common approaches to various sampling methods and discuss the reducing coverage error are: main issues in characterizing the accuracy of a survey, with a particular focus on • terminology and definitions, in order that obtaining as complete a sampling frame as pos- sible (or employing a frameless sampling strategy we can put the subsequent discussion about in which most or all of the target population has Internet-based surveys in an appropriate a positive chance of being sampled); context. • post-stratifying to weight the survey sample to match the population of inference on some observed key characteristics. Sources of error in surveys The primary purpose of a survey is to gather Sampling error arises when a sample of the information about a population. However, target population is surveyed. It results from even when a survey is conducted as a census, the fact that different samples will generate the results can be affected by several sources different survey data. Roughly speaking, of error. A good survey design seeks to reduce assuming a random sample, sampling error is all types of error – not only the sampling reduced by increasing the sample size. error arising from surveying a sample of the Nonresponse errors occur when data is population. Table 11.1 below lists the four not collected on either entire respondents general categories of survey error as presented (unit nonresponse) or individual survey ques- and defined in Groves (1989) as part of his tions (item nonresponse). Groves (1989) calls ‘Total Survey Error’ approach. nonresponse ‘an error of nonobservation’.The Errors of coverage occur when some part response rate, which is the ratio of the number of the population cannot be included in the of survey respondents to the number sampled, sample. To be precise, Groves specifies three is often taken as a measure of how well different populations: the survey results can be generalized. Higher response rates are taken to imply a lower 1 The population of inference is the population likelihood of nonresponse bias. that the researcher ultimately intends to draw Measurement error arises when the survey conclusions about. response differs from the ‘true’ response. 2 The target population is the population of inference less various groups that the researcher For example, respondents may not answer has chosen to disregard. sensitive questions honestly for a variety 3 The frame population is that portion of the target of reasons, or respondents may misinterpret population which the survey materials or devices or make errors in answering questions. delimit, identify, and subsequently allow access to Measurement error is reduced in a variety of (Wright and Tsao, 1983). ways, including careful testing and revision of Table 11.1 Sources of survey error according to Groves (1989) Type of error Definition Coverage ‘…the failure to give any chance of sample selection to some persons in the population’. Sampling ‘…heterogeneity on the survey measure among persons in the population’.