Why Large Design Effects Can Occur in Complex Sample Designs: Examples from the Nhanes 1999-2000 Survey

ASA Section on Survey Research Methods WHY LARGE DESIGN EFFECTS CAN OCCUR IN COMPLEX SAMPLE DESIGNS: EXAMPLES FROM THE NHANES 1999-2000 SURVEY David A. Lacher, Lester R. Curtin, Jeffery P. Hughes Centers for Disease Control and Prevention National Center for Health Statistics 3311 Toledo Road, Room 4215, Hyattsville, MD 20782 KEY WORDS: Design effect, complex sample, variance estimation, NHANES, laboratory testing Introduction laboratory tests were much higher, ranging up to 40. For complex sample surveys, the design effect (DEFF) for a point estimate is defined as the Previously, the large design effects were ratio of the “true” design-based sampling examined in terms of the sample design variance to the hypothetical sampling variance characteristics. Specifically, the within- and assuming the same point estimate based on a between-PSU variation was examined (4). Most simple random sample of the same size (1,2). of the large design effects occurred when the Design effects are used for a variety of purposes within-PSU component was relatively small and including comparing alternative survey designs, the between-PSU component was relatively determining the effective sample size for large. Typically, the NHANES geographic analysis, and adjusting confidence intervals for heterogeneity is not captured in the hypothetical estimates based on complex survey designs. simple random sample variance assumption, so Large or small design effects may also indicate the DEFF is inflated even though the total data problems including potential sample design sample error is small. problems or potential data quality problems. For example, an outlier, such as an extreme data Investigation of large design effects did reveal value, an influential sample weight or cluster, one particular data quality problem. This was can lead to an unusually large DEFF. For exemplified by serum sodium for NHANES laboratory measures, a large DEFF may occur 1999-2000. Serum sodium levels do not vary when there is a change in method detected by a greatly in a healthy population. After observing a drift in the quality control for the test. In the large DEFF for the mean concentration of analysis of the National Health and Nutrition sodium, the quality control data were re- Examination Survey (NHANES) survey data, examined. It was determined that there was a large design effects have been observed for slight drift in the serum sodium method. The several laboratory measurements. The paper difference was not clinically important and no presents results from one of a series of adjustment to the data was needed, but the investigations to explain the large mean design natural variation in sodium is so small that the effects seen in NHANES 1999-2000. estimated standard errors were inflated. The National Health and Nutrition Examination The examination of the sodium data motivated Survey 1999-2000 is the latest in a series of the research summarized in this paper. multi-purpose nationally representative health Specifically, this paper examines the effect of surveys conducted by the National Center for laboratory test method bias (accuracy) and Health Statistics of the Centers for Disease precision (reproducibility) on the design effects Control and Prevention. NHANES data come of the mean. The effects are evaluated using from interviews, examinations, and laboratory simulated examples based on the NHANES tests on biological specimens. The sample design 1999-2000 sample design. In addition, the is a multistage, area cluster design with relationship of the design effects of the mean of differential probabilities of selection for laboratory tests is compared to the relative designated demographic groups. Design effects standard errors (RSE). of the means of variables have been examined for the NHANES 1999-2000 survey (3). Background Although design effects of the estimated means for most variables in NHANES 1999-2000 were The conceptual basis for the DEFF originated less than 3, design effects for means of some with Cornfield (5). The use of DEFF for basic 3841 ASA Section on Survey Research Methods descriptive survey estimates has been component consisted of general biochemical, popularized by Kish (1,2). The concept of DEFF nutritional, immunological, environmental, has been extended to analytic methods, for microbiological, and hematological tests. example, in regression estimators (6). Generalized DEFF in multivariate models were Methods developed by Rao and Scott (7). Most standard textbooks on sample survey design and analysis NHANES Laboratory Data Collection and now discuss DEFF (8,9). Model based Analysis motivation of the typical form of the DEFF, summarizing the effect of differential weighting Laboratory data were collected from NHANES and a one-stage cluster design, has been provided 1999-2000 data files (12). All available by Gabler (10) and Spenser (11). laboratory data were used without deletion of outliers. Laboratory methods are described in the The design effect for a complex sample design is laboratory procedure manual for NHANES (13). defined as Vardesign/Varsrs, where Vardesign is the Statistical analysis was performed using SAS for variance of the estimate from the sampling Windows software (SAS Institute, Cary, NC) design and Varsrs is the variance assuming a and SUDDAN software (RTI, Research Triangle simple random sample (srs) at the same point Park, NC). Some tests were non-Gaussian as estimate with same number of observation units. judged by skewness and kurtosis. These tests The design effect of the mean: were log-transformed to obtain the geometric mean and standard error of the geometric mean. 2 deff(ŷmean)=V(ŷmean)/[(1-(n/N))(ssrs /n)] For each test, the sample size, arithmetic or geometric mean, standard error of the mean where V(ŷmean) is the variance of the estimate of (SEM), relative standard error of the mean, and 2 the mean, n/N is the sample fraction, and ssrs is the design effect of the mean was calculated. The the variance assuming a simple random sample relative standard error is defined as the standard (8). error of the mean divided by the estimate of the mean. The MEC examined sample weights NHANES 1999-2000 was a cross-sectional (WTMEC2YR) were used which takes into survey that collected data on the civilian account differential probabilities of selection noninstitutionalized U.S. population through from the sample design, from nonresponse, and questionnaires and medical examinations from oversampling of subgroups (14). The including laboratory tests. NHANES 1999-2000 weights also reflected the complex sample used a stratified, multistage probability design to design of NHANES. Standard errors were collect a nationally representative sample. calculated using the SUDDAN delete 1 jackknife Beginning in 1999, NHANES became a (JK1) method (15). The JK1 method partitioned continuous survey. The procedures followed to the sample into 52 sampling units and 52 select the sample and conduct the interviews and replicates deleting one unit at a time. examinations for NHANES 1999-2000 were similar to those for previous NHANES surveys. Simulation of Laboratory Test Bias and NHANES 1999–2000 was designed to over- Precision on Mean DEFF sample Mexican Americans, non-Hispanic blacks, adolescents, elderly persons 60 years and The effect of bias and precision of laboratory older, pregnant females, and beginning in the tests on the design effect of the mean was year 2000 low income non-Hispanic whites to simulated. The NHANES 1999-2000 JK1 improve estimates for these groups. The weights and sample sizes for 27 stands (survey NHANES 1999-2000 survey consisted of 12160 locations representing 26 PSUs) were used as the identified sample persons of all ages, of which sampling design for the simulation. Simulated 9965 (81.9%) were interviewed and 9282 test data were formed using a Gaussian (76.3%) were interviewed and examined. distribution with a mean of 100 and a standard deviation of 1 for 22 of 27 stands. For 5 other The laboratory component of NHANES 1999- consecutive stands, a method bias of 0, 1, 2, 3, 4, 2000 consisted of tests performed on blood or 5 was added to the mean. For each bias, the samples, urine, nasal and vaginal swabs and method precision was simulated using standard environmental samples. The laboratory deviations of 0.5, 1, 2, 3, 4, or 5. The design 3842 ASA Section on Survey Research Methods effect of the mean was calculated for each sample in NHANES because they are over- simulation. sampled or under-sampled which leads to inflated variances. Also, design effects are Results increased in NHANES compared to simple random samples due to positive correlations The descriptive statistics and design effect of the among sample persons in the same area cluster. mean for 55 laboratory tests for NHANES 1999- In addition, large design effects can occur with 2000 are seen in Table 1. The sample size, mean, large between-PSU and small within-PSU standard error of the mean, and relative standard components of sampling error which is error were calculated for each test. The relative exacerbated by the relatively small number of standard error of the tests ranged from 0.12% for PSUs (26) found in NHANES 1999-2000. serum sodium to 8.05% for blood total mercury. The design effects of the mean ranged from 1.56 Laboratory tests are subject to systematic error for lymphocyte percent to 39.52 for the mean (bias) and random error (precision) (16). In corpuscular hemoglobin concentration. The addition, biological variation between persons design effect of the mean was compared to the and within a person are components of variation relative standard error for each test (Figure 1). In for laboratory tests. Furthermore, laboratory tests general, there was an indirect relationship are subject to pre-analytical variation (improper between the relative standard error and the collection of specimens) or post-analytical errors design effect of the mean. (reporting of results). In NHANES 1999-2000, large mean design effects for laboratory tests The simulation of the effects of bias and ranging up to 40 were seen (Table 1).

Why Large Design Effects Can Occur in Complex Sample Designs: Examples from the Nhanes 1999-2000 Survey

A Primer for Using and Understanding Weights with National Datasets

Design Effects in School Surveys

Computing Effect Sizes for Clustered Randomized Trials

IBM SPSS Complex Samples Business Analytics

The Design Effect: Do We Know All About It?

Design Effect

Estimation of Design Effects for ESS Round II

Selection Bias in Web Surveys and the Use of Propensity Scores

Version 4.1 Effect Size and Standard Error Formulas

UC Berkeley UC Berkeley Electronic Theses and Dissertations

Example of the Impact of Weights and Design Effects on Contingency Tables and Chi-Square Analysis

Sample Size Calculation and Development of Sampling Plan