Methods of Selecting Samples in Multiple Surveys to Reduce Respondent Burden
Total Page:16
File Type:pdf, Size:1020Kb
METHODS OF SELECTING SAMPLES IN MULTIPLE SURVEYS TO REDUCE RESPONDENT BURDEN Charles R. Perry, Jameson C. Burt and William C. Iwig, National Agricultural Statistics Service Charles Perry, USDA/NASS, Room 305, 3251 Old Lee Highway, Fairfax, VA 22030 KEY WORDS: Respondent burden, multiple component, two methods of sampling are proposed surveys, stratified design that reduce this type of respondent burden. Historically, NASS has attempted to reduce Summary respondent burden and also reduce variance. In 1979, Tortora and Crank considered sampling with probability inversely proportional to burden. Noting The National Agricultural Statistics Service (NASS) a simultaneous gain in variance with a reduction in surveys the United States population of farm burden, NASS chose not to sample with probability operators numerous times each year. The list inversely proportional to burden. NASS has reduced components of these surveys are conducted using burden on the area frame component of its surveys. independent designs, each stratified differently. By There, a farmer sampled on one survey might be chance, NASS samples some farm operators in exempt from another survey, or farmers not key multiple surveys, producing a respondent burden to that survey might be sampled less intensely. concern. Two methods are proposed that reduce Statistical agencies in other countries have also this type of respondent burden. The first method approached respondent burden. For example, the uses linear integer programming to minimize the Netherlands Central Bureau of Statistics does some expected respondent burden. The second method co-ordinated or collocated sampling, ingeniously samples by any current sampling scheme, then, conditioning samples for one survey on previous within classes of similar farm operations, it surveys (de Ree, 1983). minimizes the number of times that NASS samples a farm operation for several surveys. Formal Description of Methods I and II The second method reduces the number of times that a respondent is contacted twice or more within a survey year by about 70 percent. The first method Method I is formally described by four basic tasks. will reduce this type of burden even further. (a) Cross-classify the population by the stratifi- cations used in the individual surveys. This Introduction produces the coarsest stratification of the pop- ulation that is a substratification of each indi- The National Agricultural Statistics Service (NASS) vidual stratification. surveys the United States population of farm operators numerous times each year. Some surveys (b) Proportionally allocate each of the individual are conducted quarterly, others are conducted stratified samples to the substrata. Use monthly and still others are conducted annually. random assignment between substrata where Each major survey uses a list dominant multiple necessary. frame design and an area frame component that (c) Apply integer linear programming within each accounts for that part of the population not on substratum to assign the samples to the labels the list frame. The list frame components of these of units belonging to the substratum so that surveys constitute a set of independent surveys, the respondent burden is minimized. each using a stratified simple random sample design with different strata definitions. With the current (d) Randomize the labels to the units of procedures some individual farm operators are substratum. The final assignment within each sampled for numerous surveys while other farm substratum is a simple random sample with operators with similar design characteristics are respect to each of the proportionally allocated hardly sampled at all. Within the list frame samples. Method II is formally described by four basic tasks. the original stratified sample allocations over the substratum. Originally, these allocations are (a) Using an equal probability of selection tech- random to each substratum, constrained only so nique within a stratum, select independent strat- that the substratum sample sizes sum to their ified samples for each survey. Notice that the stratum sample size. Second, for the first method equal probability of selection criterion permits the respondent burden is minimized over each efficient zonal sampling techniques on each sur- substrata by the linear programming step. vey within strata. Currently, within strata Regarding variance reduction, this means that if the samples are selected systematically with re- original sample was selected using simple random cords essentially in random order. sampling within each stratum, then the first method (b) Substratify the population by cross-classifying reduces respondent burden without any offsetting the individual farm units according to the increase in variance, since proportional allocation stratifications used in the individual surveys. is at least as efficient as simple random sampling. However, the first method would be less efficient (c) Randomly reassign within each substratum for variance than zonal sampling unrestricted by the samples associated with units having burden. But the second method, by reallocating excess respondent burden to units having less some zonal sampling units to reduce respondent respondent burden. burden, may only slightly increase variance over no (d) Iterate the reassignment process until it reallocation and then only when zonal sampling is minimizes the number of times that NASS effective. samples a farm operator for several surveys in the substratum. A Simple Simulation of Method I For both methods, define respondent burden by an index that represents the comparative burden on Method I reduces respondent burden in the following each individual sampling unit in the population. simulation of two surveys. Survey I samples n = 20 Each survey considered is assigned a burden value. from N = 110. Survey II also samples n = 20 from When a sampling unit is selected for multiple N = 110, though each of its strata has either a larger surveys, the burden index may be additive or or smaller population size (N:1 = 40 and N:2 = 70) some other functional form dependent on the than the corresponding strata of survey I (N1: = 30 individual survey burden values. Consequently, and N2: = 80). Here, the first subscript corresponds each sampling configuration is assigned a unique to the first survey, with its strata 1 and 2. Simi- respondent burden index. larly, the second subscript corresponds to the second survey. For example, N21 = 30 corresponds to the For any reasonable respondent burden index, the size of the population in stratum 2 of survey I and first method minimizes the expected respondent (1) in stratum 1 of survey II, whilen ¯21 = 3:75 corre- burden. This follows easily from the following sponds to the proportional allocation of survey I’s observations, where it is assumed that for each (1) stratum 2 sample, n2: = 10, to the population in of the original surveys an equal probability of both stratum 2 of survey I and stratum 1 of survey selection mechanism (epsm) is used within strata. II. First, from the independence of the original sample designs, it follows that for each individual unit the expected burden from the original stratified samples Survey I Survey II Stratum 1 Stratum 2 is equal to the expected respondent burden using N11 = 10 N12 = 20 N1¢ = 30 (1) (1) (1) proportional allocation followed by epsm sampling Stratum 1 n¹ = 3:33 n¹ = 6:67 n = 10 11 12 1¢ (2) (2) within substrata. Since the respondent burden n¹ = 2:5 n¹ = 2:85 12 over any population is the sum of the respondent 11 N21 = 30 N22 = 50 N2¢ = 80 (1) (1) (1) burden on the individuals of the population, the Stratum 2 n¹ = 3:75 n¹ = 6:25 n = 10 21 22 2¢ (2) (2) equality holds for the entire population or any n¹ = 7:5 n¹ = 7:15 21 22 subpopulation including the substrata. That is, (2) (2) N = 40 n = 10 N = 70 n = 10 the expected respondent burden over any arbitrary ¢1 ¢1 ¢2 ¢2 substratum for the proportionally allocated samples is equal to the expected respondent burden of With two surveys, at most we will sample a estimates. These are the same restrictions that respondent twice. For the above two surveys, apply to the permanent random number techniques without any proportional allocation, we simulated discussed by Ohlsson (1993). two independent stratified simple random samples 3 million times. These simulated samples produced, Method I on average, 3.6 double hits for the whole population Using this notation for Method I, we next describe a of 110 potential respondents, and four percent of sequence of simple data manipulation steps that can the simulations produced 7 or more double hits. be used operationally to perform tasks (a) through With the proportional allocations indicated in the (d) on page 1 for each of the K surveys. diagram for Method I, the population exceeds the total sample for both surveys in each substratum, Suppose that each unit, ui, of the population U has so no sampling unit needs to be selected for been stratified for each of the K surveys. Further both surveys. The high respondent burdens of suppose that this information has been entered into independent sampling are reduced to 0 double hits a file containing N records, so that the ith record with Method I! contains the stratification information for unit i. To be definitive, assume that the variable S(k) denotes the stratum classification code for survey k and Operational Description that S(k : i) denotes the value of the stratum classification code for unit ui. Basic Notation For each survey k (k = 1; 2;:::;K) perform the fol- lowing sequence of operations. N Let U = fuigi=1 be a finite population of size N. Suppose that U is surveyed on K occasions and that on each occasion a different independent stratified (a) Sort the data file by the variables S(k);:::; design is used. For these K stratified designs, denote S(K);S(1);:::;S(k ¡ 1). This will the survey occasion by k = 1; 2;:::K and let us use hierarchically arrange the records of the the following notation.