Accounting for Cluster Sampling in Constructing Enumerative Sequential Sampling Plans
Total Page:16
File Type:pdf, Size:1020Kb
SAMPLING AND BIOSTATISTICS Accounting for Cluster Sampling in Constructing Enumerative Sequential Sampling Plans 1 2 A. J. HAMILTON AND G. HEPWORTH J. Econ. Entomol. 97(3): 1132Ð1136(2004) ABSTRACT GreenÕs sequential sampling plan is widely used in applied entomology. GreenÕs equa- tion can be used to construct sampling stop charts, and a crop can then be surveyed using a simple random sampling (SRS) approach. In practice, however, crops are rarely surveyed according to SRS. Rather, some type of hierarchical design is usually used, such as cluster sampling, where sampling units form distinct groups. This article explains how to make adjustments to sampling plans that intend to use cluster sampling, a commonly used hierarchical design, rather than SRS. The methodologies are illustrated using diamondback moth, Plutella xylostella (L.), a pest of Brassica crops, as an example. Downloaded from KEY WORDS cluster sampling, design effect, enumerative sampling, Plutella xylostella, sequential sampling http://jee.oxfordjournals.org/ WHEN CONSIDERING THE DISTRIBUTION of individuals in a it is more practicable to implement. Therefore, the population, the relationship between the population sampling design needs to be taken into account when variance and mean typically obeys a power law (Tay- constructing a sequential sampling plan. Most pub- lor 1961). According to TaylorÕs Power Law (TPL), lished enumerative plans do not do this, and thus the variance-mean relation can be described by the implicitly assume that the Þeld will be sampled ac- following equation: cording to SRS (e.g., Allsopp et al. 1992, Smith and Hepworth 1992, Cho et al. 1995). 2 ϭ b s ax [1] In this article, we show how to account for cluster where s2 and x are the sample variance and sample sampling (Fig. 1), a type of multistage design, when mean respectively, a is a scaling factor that is depen- determining minimum sample sizes for a sequential by guest on May 3, 2016 dent on sampling method and habitat, and the expo- sampling plan. In cluster sampling, elementsÑthe in- nent b is a measure of aggregation. For simplicity of dividual units from which data are collectedÑare sam- explanation, we always refer to the sample variance pled in clusters, each representing a primary (sam- (s2) and the sample mean (x or m) rather than the pling) unit (Cochran 1977, pp. 233 and 274). Elements population variance (2) and the population mean are also sometimes called subunits, small-units, or sec- (). If b is not signiÞcantly different from unity, then ondary sampling units because they are divisions of the population is assumed to have a random distribu- the primary sampling unit (Cochran 1977, pp. 233 and tion. When b Ͼ 1 the population is aggregated, and 238). Note that here we are concerned with the sit- when b Ͻ 1 the population has a regular distribution. uation where the number of elements per cluster is For a particular population, the parameters a and b can already determined, as is often the case in practice. If be estimated by regression of ln s2 on ln x: this were not the case, then the methods outlined by Hutchison (1994, pp. 215Ð217) and Binns et al. (2000, ln s2 ϭ ln aˆ ϩ bˆ ln x [2] pp. 183Ð191) could be used to determine the number of elements per cluster. where aˆ and bˆ are the estimates of a and b respectively. Diamondback moth, Plutella xylostella (L.), is the TaylorÕs (1961) aˆ and bˆ values are used in GreenÕs (1970) equation to construct a sequential sampling species used to illustrate the methodology. In addition plan. A sampling plan derived from GreenÕs (1970) to investigating cluster sampling issues, we take the formula makes the assumption that the Þeld will be opportunity to consider the spatial distribution of this sampled according to SRS. Most integrated pest man- species, as described by TaylorÕs (1961) b parameter, agement (IPM) sampling programs, however, use and to present a sampling plan for use in Australian some form of multistage systematic sampling, because broccoli crops. 1 Department of Primary Industries (KnoxÞeld), Private Bag 15, Materials and Methods Ferntree Gully Delivery Centre, Victoria 3156, Australia (e-mail: [email protected]). Data Collection. All data were collected from broc- 2 Department of Mathematics and Statistics, The University of Mel- coli, Brassica oleracea variety botrytis L., Þelds accord- bourne, Victoria, 3010, Australia. ing to the following procedure. Two transects were 0022-0493/04/1132Ð1136$04.00/0 ᭧ 2004 Entomological Society of America June 2004 HAMILTON AND HEPWORTH:ACCOUNTING FOR CLUSTER SAMPLING 1133 in the estimates of TaylorÕs a and b parameters (Ͻ1.2 and Ͻ0.09%, respectively). TaylorÕs a and b parameters were estimated by re- 2 gressing ln su against ln xˆ . Because there was error 2 associated with estimating both su and x, BartlettÕs regression method was used (Bartlett 1949). The re- gression was done using a BASIC program (Legg 1986), and the data were sorted according to the independent variable. To make inferences about the spatial distribution of the species, we tested the null hypothesis that the slope was not signiÞcantly differ- ent from one by using the t-test procedure of Bartlett (1949). There was one singleton observation (i.e., an observation where larvae were only observed in one element), and this was excluded from all analyses, because the values of s2 and x are restricted for sin- gleton observations (s2 ϭ x), which leads to pseudo- randomness (Taylor 1984). Sample Size Adjustment. The regression parame- Fig. 1. Diagrammatic representation of cluster sampling 2 Downloaded from ters obtained from a TPL plot based on the su esti- protocol used in broccoli Þelds. mates could be used to develop a sequential sampling plan using GreenÕs (1970) method. However, such a plan assumes that SRS (or at least SRegS) would be marked in a Þeld according to a standard V-shaped used, and this is clearly not going to be so for plans that pattern that extended from one corner of the Þeld, to include clusters of elements (i.e., like the sampling the midpoint on the other side, and back to the ad- procedure described above that was used to collect http://jee.oxfordjournals.org/ jacent corner (Hoy et al. 1983, Torres and Hoy 2002). the data for this study). A plan based on these TaylorÕs Along each transect, there were Þve equidistantly a and b values would give an underestimate of the spaced sampling points. At each point, the total num- minimum sample size. Consequently, the stop-lines ber of larvae (all instars) on four nearby plants (ele- for the sequential plan would also be incorrect. We ments) was recorded. Each of these groups of four adjusted for this by using the design effect (deff) (Kish plants formed a cluster, and there were 10 clusters, 1995, pp. 162 and 257). The deff is the ratio of the resulting in 40 elements in total. variance obtained from a complex sample to the vari- All crops were planted in beds that consisted of two ance obtained from a SRS of the same number of units staggered rows of plants. The four plants chosen for (or elements). For the cluster sampling used in the by guest on May 3, 2016 sampling were from the bed nearest to the sampling broccoli Þelds, point. From this bed, the nearest two plants in the row ϭ 2 2 closest to the sampling point were sampled as well as deff MSc/su [3] the two closest plants in the next row. Thus, at each cluster, equal numbers of elements from both sides of This formula for the deff is sometimes written with a the bed were sampled. factor, 1 Ϫ f, in the denominator, where f is the sam- Surveys were conducted on properties in the Cran- pling fraction (sample size/population size). How- bourne and Werribee vegetable growing regions on ever, in this study, and those to which the results will MelbourneÕs outer eastern and western fringes, re- be applied, f is negligible, and so the factor (being spectively. In total, 23 crops were surveyed from the close to 1) can be omitted. Werribee region and seven from around Cranbourne. Note that the deff formula as deÞned by Kish (1995, Each crop was surveyed at approximately weekly in- pp. 162 and 257) uses the actual between cluster vari- 2 2 tervals from 1 wk after transplant through to harvest ance rather than MSc. Although the MSc is not a (usually Ϸ7 to 8 wk). Growers maintained normal readily interpretable quantity, unlike the between spray practices (typically calendar spraying) through- cluster variance, it makes the computation of the deff out the period of data collection. more simple, because it can readily be calculated using Variance Estimation and Regression for Taylor’s a nested analysis of variance (ANOVA). and b Parameters. The simplest way to estimate the In this way, the deff can be calculated separately for variance among all elements in the population (2) each observation. However, it is likely to vary sub- would have been to use the variance associated with stantially across observations, and in most cases we the 40 elements in the sample. This unadjusted sample want to calculate an overall deff, for use in future variance (s2) is, however, a slightly biased estimate of sampling plans. Thus, it is necessary to perform a 2, because the elements were drawn from contiguous nested ANOVA covering all observations to calculate 2 2 groups (i.e., clusters) rather than at random. We used MSc and su. If the data are unbalanced (e.g., variable the method of Cochran (1977, p. 239) to obtain an numbers of elements per cluster across sampling oc- 2 (almost) unbiased variance estimate, su.