<<

Two-Stage Cluster : General Guidance for Use in Public Heath Assessments

Introduction to Cluster Sampling

Cluster sampling involves the “30 x 7” was select seven interview sites dividing the specific developed by the World per block. The 30x7 What’s Inside: population of interest into Health Organization with method is an example of geographically distinct the aim of calculating the what is known as a two-stage How the Numbers of 2 groups or clusters, such as prevalence of immunized cluster sample. In the first Clusters and Interviews neighborhoods or families. children within +/- 10 Affect the stage, blocks are Because the information is percentage points. That is, randomly selected, while in readily available, many if the true prevalence was the second stage, interview Choosing the Right Number 2 people use census blocks or 40%, one would expect an locations are randomly of Clusters and Interviews block groups for their estimate between 30% and selected within each census clusters. 50% when using the 30x7 block. Census blocks are Stratification and 3 method. the primary sampling units, Subgroups A random sample of clusters is obtained, and This design has been while the random interview then members of the adopted for other purposes locations are your selected clusters are then such as rapid needs secondary sampling units surveyed (either randomly assessments with no (or Census blocks may be or as a census). Contrast only slight) modification. selected in stage one this with stratified This sampling scheme is through a method known sampling, in which the thought to be sufficient for as "probability population is divided into most sampling of proportionate to distinct groups (e.g., states community health factors. population size," which or ethnicities) and then 30 x 7 that you means that a census block random samples are randomly select 30 census with more households is obtained from each group. blocks from a list from all more likely to be included A commonly used two-stage the census blocks in your than one with fewer cluster sampling scheme, county and then randomly households. From the North Carolina Center for Public Health Preparedness in cooperation with Want a basic introduction to sampling? North Carolina Public Health Regional Surveillance Team 5 See “A Guide to Sampling for Community Health Assessments and Other Projects” available at: http://nccphp.sph.unc.edu/PHRST5 Page 2 Two-Stage Cluster Sampling: General Guidance for Use in Public Heath Assessments

How the Numbers of Clusters and Interviews Affect the Data

Two locations drawn randomly from to randomly select more clusters than or larger sample size results in within the same census block are likely to randomly select more points within greater precision. to be more similar than two locations any particular cluster. In other words: Precision also gives you an idea of the from different census blocks selecting an additional cluster provides width of the (remember, the goal of census blocks is more information than selecting around the estimated prevalence. to be as uniform as possible with additional points within a cluster. With an estimated prevalence of 35% respect to population characteristics, The variance of an estimate is a and a 95% confidence interval of 25% economic status, and living measure of its dispersion over all to 45%, a correct interpretation would conditions). So, two locations within possible samples. For any sampling be that you are 95% confident that the the same census block do not each situation, there are an extremely large true (unknown) prevalence lies contribute completely independent number of possible samples, each one somewhere between 25% and 45%, information; this is known as the of which would produce an estimate of but it’s just as likely to be near 25% or “intra-cluster correlation” or ICC. the value of interest. The variance 45% as to be near 35%. Of course, In statistical terms, this correlation gives an indication of the likely you never know the true prevalence; always increases the variance of your “spread” of the estimates, or the otherwise, there would be no reason to estimate, which reduces the that values that might result from all do the study! precision. As a result, it's always better these different estimates. A lower

Choosing the Right Number of Clusters and Interviews

Since selecting more clusters rather census blocks, which may cost more you need. In fact, if the ICC is too than more points within any cluster and delay reporting important results high, it’s not always possible to achieve improves precision, using a “40 x 5” to authorities. Additionally, you may the same level of precision by sampling method likely yield estimates with not have more than 30 census blocks 15 or 20 blocks as compared to 30 more precision than the 30 x 7 in a county or an area affected by a blocks, no matter how many locations method, even though it involves fewer natural disaster or an outbreak. are sampled within each block. If the total interviews (200 compared to ICC were as high as 0.20, you would On the other hand, a 20 x 10 or 15 x 210). So shouldn’t you always choose need to sample 96 locations in 20 14 method may save time or money, more clusters and fewer interviews? blocks (for a sample size of 1,920) to but would result in a substantially less achieve the same precision as a 30 x 7 If survey costs or time are important, precise estimate. If it’s not possible to design! such as during a rapid needs include more than 15 or 20 census assessment, additional clusters may be blocks in the first stage of your sample, So consider the balance of timeliness more costly or time-consuming than you may need to increase the number and precision in choosing your study additional locations within a cluster, of interview locations in the second design. If you have the time and so the reduction in sample size from stage to as many as 18 or more in money, choosing a two-stage cluster 210 to 200 might not necessarily lead order to achieve the same statistical sampling design with more census to improved or precision as with a 30 x 7 design. The blocks and fewer interviews per block timeliness. For example, interviewer actual number of interview locations will give you the most precise estimates travel time to 40 census blocks may be depends on the ICC. The higher this of the variables you are trying to much greater than travel time to 30 correlation, the larger total sample size measure in your community. Two-Stage Cluster Sampling: General Guidance for Use in Public Heath Assessments Page 3

Stratification and Subgroups

Stratification is the process of sorting While it is statistically valid and maybe Of course, it is always possible to individuals into homogeneous groups even statistically desirable to stratify, obtain estimates within specific prior to sampling. Groups, or strata, analysis of stratified data collected subgroups of your population (as long should be mutually exclusive (no using a two-stage cluster sampling as members of that subgroup were member should belong to more than method can be complex. Your analysis actually sampled), even if the original one group) and exhaustive (all must account for both stratification sample design was not stratified members should belong to some and clustering when computing your according to these subgroups. For group). After individuals are divided estimates and standard errors. While example, it will be possible to estimate into groups, sampling can be done this is not too difficult to do in the outcome proportion within census within each group. An example of sophisticated statistical computing blocks that are predominantly mutually exclusive and exhaustive software (usually SAS or SUDAAN), it Hispanic even if the design was not groups would be males and females. would require someone with special stratified by ethnicity, as long as some expertise in this subject area and the of the census blocks sampled are Typically, proportionate allocation software. predominantly Hispanic. Once again, should be used when conducting the estimates within any subgroup will The 30 x 7 cluster sampling method . If 60% of your be less precise than the overall was not designed to collect stratified community members are female and estimate. 40% are male, then your sample data, but rather to provide overall should be 60% female and 40% male. estimates for a designated assessment It is also possible to increase the In this way, stratification ensures that area. If a stratified design is used, it is precision of your overall estimate using all groups are sufficiently represented. always possible to obtain estimates for “post-stratification” in which a each individual stratum (e.g., if the stratified analysis is conducted even Since stratification almost always design were stratified by flood plain, though the sampling design used to increases the precision of your you would obtain an overall estimate, obtain the data was not necessarily estimates and narrows the confidence but could also obtain estimates within stratified. As with stratified sampling, intervals around your estimates, it may each flood plain). Of course, the se post-stratification will be especially be desirable to stratify. stratum-specific estimates will always useful if the stratification factor is at This is particularly true if the be less precise than the overall least moderately associated with the stratification factor is at least estimate. outcome of interest. Post-stratification can be easily accomplished by moderately related to your outcome If you wish to calculate separate including the stratification factor as a variable. An example might be the estimates of equal precisions for covariate in a regression model using a stratification of properties by specific population subgroups using statistical software package such as designated flood zone in a rapid needs the 30 x 7 or a similar two-stage cluster SAS. assessment following a hurricane. The design, you must obtain sufficient statistical precision gained from samples from each subgroup to achieve stratification such as this may result in that level of precision; taking separate needing fewer census block clusters in 30 x 7 samples from each subgroup your study than you would with an may even be necessary . unstratified design The North Carolina Center for Public Health Preparedness (NCCPHP) offers a variety of training activities and technical support to local and state public health agencies. NCCPHP is funded by the Centers for Disease Control and Prevention under grant/cooperative agreement number U90/424255 to The North Carolina Center for improve the capacity of the public health workforce to prepare Public Health Preparedness for and respond to terrorism and other emerging public North Carolina Institute for Public Health health threats. The contents of this newsletter are the Gillings School of Global Public Health responsibility of the authors and do not necessarily represent University of North Carolina at Chapel Hill the official views of CDC. 400 Roberson Street Campus Box 8165 To learn more about NCCPHP’s training and technical Chapel Hill, NC 27599-8165 assistance, visit http://nccphp.sph.unc.edu.

Phone: 919-843-5561 Fax: 919-843-5563 Public Health Regional Surveillance Team 5 is one of seven Email: [email protected] “PHRSTs” created by the Office of Public Health Preparedness and Response in the North Carolina Division of Public Health to provide support to local health agencies serving all 100 counties. The host counties for these regional offices are Buncombe, Mecklenburg, Guilford, Durham, Cumberland, Pitt, and New Hanover. Each team includes an epidemiologist, an industrial hygienist, a nurse consultant, and administrative specialist.

Public Health Regional Surveillance Team 5

Team Leader: Steven Ramsey Phone: 336-641-8192 E-mail: [email protected]

Reference

1. Henderson RH and Sundaresan T. Cluster sampling to assess immunization coverage: a review of experience with a simplified sampling method. Bulletin of the World Health Organization, 60(2):253-260. Available at: http:// whqlibdoc.who.int/bulletin/1982/Vol60-No2/bulletin_1982_60(2)_253-260.pdf