Example of the Impact of Weights and Design Effects on Contingency Tables and Chi-Square Analysis

Journal of Modern Applied Statistical Methods Copyright © 2003 JMASM, Inc. November, 2003, Vol. 2, No. 2, 425-432 1538 – 9472/03/$30.00 Example Of The Impact Of Weights And Design Effects On Contingency Tables And Chi-Square Analysis David A. Walker Denise Y. Young Educational Research and Assessment Institutional Research Northern Illinois University University of Dallas Many national data sets used in educational research are not based on simple random sampling schemes, but instead are constructed using complex sampling designs characterized by multi-stage cluster sampling and over-sampling of some groups. Incorrect results are obtained from statistical analysis if adjustments are not made for the sampling design. This study demonstrates how the use of weights and design effects impact the results of contingency tables and chi-square analysis of data from complex sampling designs. Key words: Design effect, chi-square, weighting Introduction Methodology Many large-scale data sets used in educational In large-scale data collection, survey research research are constructed using complex designs applies varied sample design techniques. For characterized by multi-stage cluster sampling example, in a single-stage simple random and over-sampling of some groups. Common sample with replacement (SRS), each subject in statistical software packages such as SAS and the study has an equal probability of being SPSS yield incorrect results from such designs selected. Thus, each subject chosen in the unless weights and design effects are used in the sample represents an equivalent total of subjects analysis (Broene & Rust, 2000; Thomas & in the population. More befitting, however, is Heck, 2001). The objective of this study is to that data collection via survey analysis often demonstrate how the use of weights and design involves the implementation of complex survey effects impact the results of contingency tables design (CSD) sampling, such as disproportional and chi-square analysis of data from complex stratified sampling or cluster sampling, where sampling designs. subjects in the sample are selected based on different probabilities. Each subject chosen in the sample represents a different number of This article is based upon work supported by the subjects in the population (McMillan & Association for Institutional Research (AIR), Schumacher, 1997). National Center for Education Statistics Complex designs often engender a (NCES), and the National Science Foundation particular subgroup, due to oversampling or (NSF) through fellowship grants awarded to the selection with a higher probability, and authors to participate in the 2001 AIR Summer consequently the sample does not reflect Data Policy Institute on Databases of NCES and accurate proportional representation in the NSF. Correspondence for this article should be population of interest. Thus, this may afford sent to: David Walker, Northern Illinois more weight to a certain subgroup in the sample University, ETRA Department, 208 Gabel, than would be existent in the population. As DeKalb, IL 60115, (815)-753-7886. E-mail him Thomas and Heck (2001) cautioned, “When at: [email protected]. using data from complex samples, the equal weighting of observations, which is appropriate with data collected through simple random samples, will bias the model’s parameter 425 426 THE IMPACT OF WEIGHTS AND DESIGN EFFECTS estimates if there are certain subpopulations that Raw Expansion Weight (Wj) = n (1) have been oversampled” (p. 521). ∑ wj = N The National Center for Education j=1 Statistics (NCES) conducts various national surveys that apply complex designs to projects Weighted Mean (⎯x) = n (2) such as the Beginning Postsecondary Students ∑ wj xj / ∑ wj study (BPS), the National Educational j=1 Longitudinal Study of 1988 (NELS: 88), or the National Study of Postsecondary Faculty Mean Weight (⎯w) = n (3) (NSOPF). Some statistical software programs, ∑ w / n for instance SPSS or SAS, presuppose that data j j=1 were accumulated through SRS. These statistical programs tend not to use as a default setting Relative Weight = w /⎯w (4) sample weights with data amassed through j complex designs, but instead use raw expansion weights as a measure of acceptable sample size Notes: n = sample size, j=1 = subject response, (Cohen, 1997; Muthen & Satorra, 1995). wj = raw weight, xj = variable value, N = However, the complex sampling designs utilized population size in the collection of NCES survey data allocates larger comparative importance to some sampled Furthermore, the lack of sample elements than to others. To illustrate, a complex weighting with complex designs causes design identified by the NCES may have a inaccurate estimates of population parameters. sample selection where 1 subject out of 40 is The existence of variance estimates, which chosen, which indicates that the selection underestimate the true variance of the probability is 1/40. The sample weight of 40, population, induce problems of imprecise which is inversely proportional to the selection confidence intervals, larger than expected probability, indicates that in this particular case degrees of freedom, and an enhancement of 1 sample subject equals 40 subjects in the Type I errors (Carlson, Johnson, & Cohen, 1993; population. Lee, Forthofer, & Lorimor, 1989). Because of the use of complex designs, Design effect (DEFF) indicates how sample weighting for disparate subject sampling design influences the computation of representation is employed to bring the sample the statistics under study and accommodates for variance in congruity with the population the miscalculation of sampling error. As noted variance, which supports proper statistical previously, since statistical software programs inferences. The NCES incorporates as part of its often produce results based on the assumption data sets raw expansion weights to be applied that SRS was implemented, DEFF is used to with the data of study to ensure that the issues of adjust for these inaccurate variances. DEFF, as sample selection by unequal probability defined by Kish (1965), is the ratio of the sampling and biased estimates have been variance of a statistic from a CSD to the addressed. Relative weights can be computed variance of a statistic from a SRS. from these raw expansion weights. 2 DEFF = _SE CSD_ (5) Because the NCES accrues an 2 abundance of its data for analysis via CSD, the SE SRS following formulae present how weights function. The raw expansion weight is the The size of DEFF is affined to weight that many statistical software programs conditions such as the variables of interest or the use as a default setting and should be avoided attributes of the clusters used in the design (i.e., when working with the majority of NCES data. the extent of in-cluster homogeneity). A DEFF Instead, the relative weight should be used when greater than 1.0 connotes that the sampling conducting statistical analyses with NCES design decreases precision of estimate compared complex designs. to SRS, and a DEFF less than 1.0 confirms that WALKER & YOUNG 427 the sampling design increases precision of analysis, which resulted in 6,948 students. estimate compared to SRS (Kalton, 1983; Although there were 14,915 students in the 1994 Muthen & Satorra, 1995). As Thomas and Heck follow-up of NELS: 88, only 12,509 had high (2001) stated, “If standard errors are school transcript data (F3TRSCWT > 0) from underestimated by not taking the complex which F2RHMA_C was obtained. Of these, sample design into account, there exists a greater 6,948 participated in post-secondary education likelihood of finding erroneously ‘significant’ by the time of the third follow-up in 1994. parameters in the model that the a priori Missing values were not a problem with established alpha value indicates” (p. 529). RMATH. Of the 14,915 students in the 1994 follow-up of NELS: 88, 6,943 had a legitimate Procedures missing value because they had not participated Three variables were selected from the in postsecondary education (i.e., not of interest public-use database of the National Education for this paper), 16 had missing values, and 7,956 Longitudinal Study of 1988 to demonstrate the had a value (yes or no) for postsecondary impact of weights and design effects on remedial math. contingency tables and chi-square analysis. A There were some missing values for two-stage cluster sample design was used in high school transcript data, but the transcript NELS: 88, whereby approximately 1,000 eighth- weight (F3TRSCWT) provided in NELS: 88 grade schools were sampled from a universe of takes into account missing transcript data. The approximately 40,000 public and private eighth- Carnegie units of high school math grade schools (first stage) and 24 eighth-grade (F2RHMA_C) came from high school transcript students were randomly selected from each of data. There were 14,915 students in the 1994 the participating schools (second stage). follow-up of NELS: 88; however, only 12,509 An additional 2 to 3 Asian and Hispanic had high school transcript data. That is why students were selected from each school, which NCES provides a separate weight (F3TRSCWT) resulted in a total sample of approximately that is to be used specifically with variables 25,000 eighth-grade students in 1988. Follow-up from high school transcript data. studies were conducted on subsamples of this This weight has already been adjusted cohort in 1990, 1992, 1994, and 2000. by NCES for missing high school transcript Additional details on the sampling methodology observations. Of the 7,956 students with a value for NELS: 88 are contained in a technical report for RMATH, 1,008 did not have high school from the U.S. Department of Education (1996). transcript data. These 1,008 students were not The three variables used in this example included in the analysis presented here (7,956- are F2RHMA_C (total Carnegie Units in 1,008 = 6948 students for analysis in this paper). mathematics taken in high school), RMATH After selecting the 7,956 students with a value (flag for whether one or more courses in for RMATH, only those observations with remedial math were taken since leaving high F3TRSCWT>0 were selected.

Example of the Impact of Weights and Design Effects on Contingency Tables and Chi-Square Analysis

A Primer for Using and Understanding Weights with National Datasets

Design Effects in School Surveys

Computing Effect Sizes for Clustered Randomized Trials

IBM SPSS Complex Samples Business Analytics

The Design Effect: Do We Know All About It?

Design Effect

Estimation of Design Effects for ESS Round II

Selection Bias in Web Surveys and the Use of Propensity Scores

Version 4.1 Effect Size and Standard Error Formulas

UC Berkeley UC Berkeley Electronic Theses and Dissertations

Sample Size Calculation and Development of Sampling Plan

Conditional Design Effects for SEM Estimates