Evaluating the Comparability of Scores from Achievement Test Variations
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating the Comparability of Scores from Achievement Test Variations Phoebe C. Winter, Editor Copyright © 2010 by the Council of Chief State School Officers, Washington, DC All rights reserved. THE COUNCIL OF CHIEF STATE SCHOOL OFFICERS The Council of Chief State School Officers (CCSSO) is a nonpartisan, nationwide, nonprofit organization of public officials who head departments of elementary and secondary education in the states, the District of Columbia, the Department of Defense Education Activity, and five U.S. extra‐state jurisdictions. CCSSO provides leadership, advocacy, and technical assistance on major educational issues. The Council seeks member consensus on major educational issues and expresses their views to civic and professional organizations, federal agencies, Congress, and the public. Evaluating the Comparability of Scores from Achievement Test Variation Based on research funded by an Enhanced Assessment Grant from the U.S. Department of Education, awarded to the North Carolina Department of Public Instruction, in partnership with the Council of Chief State School Officers and its SCASS TILSA and SCASS CAS. Publication of this document shall not be construed as endorsement of the views expressed in it by the U.S. Department of Education, the North Carolina Department of Public Instruction, or the Council of Chief State School Officers. COUNCIL OF CHIEF STATE SCHOOL OFFICERS Steven Paine (West Virginia), President Gene Wilhoit, Executive Director Phoebe C. Winter, Editor Council of Chief State School Officers One Massachusetts Avenue, NW, Suite 700 Washington, DC 20001‐1431 Phone (202) 336‐7000 Fax (202) 408‐8072 www.ccsso.org ISBN: 1‐884037‐28‐3 Copyright © 2010 by the Council of Chief State School Officers, Washington, DC All rights reserved. Acknowledgments This research project benefited from the support and advice of members of two of CCSSO’s State Collaboratives on Assessment and Student Standards: the Technical Issues in Large-scale Assessment consortium, led by Doug Rindone; and the Comprehensive Assessment Systems for Title I consortium, led by Carole White. The editor wants to thank particularly the state department staff members of the project research team for their work and thoughtful participation in the project: Mildred Bazemore, North Carolina Department of Public Instruction Jane Dalton, North Carolina Department of Public Instruction Tammy Howard, North Carolina Department of Public Instruction Laura Kramer, Mississippi Department of Education Joseph Martineau, Michigan Department of Education Nadine McBride, North Carolina Department of Public Instruction Marcia Perry, Virginia Department of Education Liru Zhang, Delaware Department of Education Contents Section 1: Introduction Chapter 1: Comparability and Test Variations............................................................................ 1 Section 2: Studies of Comparability Methods Chapter 2: Summary of the Online Comparability Studies for One State’s End-of-course Program ..................................................................................................................................... 13 Chapter 3: Evaluating the Comparability of English and Spanish Video Accommodations for English Language Learners....................................................................................................... 33 Chapter 4: Evaluating Linguistic Modifications: An Examination of the Comparability of a Plain English Mathematics Assessment.................................................................................... 69 Chapter 5: Evaluating the Comparability of Scores from an Alternative Format..................... 95 Chapter 6: Modified Tests for Modified Achievement Standards: Examining the Comparability of Scores to the General Test .......................................................................... 105 Section 3: Literature Reviews Related to Comparability of Various Types of Test Variations Chapter 7: Comparability of Paper-based and Computer-based Tests: A Review of the Methodology ........................................................................................................................... 119 Chapter 8: Validity Issues and Empirical Research on Translating Educational Achievement Tests ........................................................................................................................................ 153 Chapter 9: Considerations for Developing and Implementing Translations of Standardized K– 12 Assessments ....................................................................................................................... 185 Chapter 10: Impact of Language Complexity on the Assessment of ELL Students: A Focus on Linguistic Simplification of Assessment................................................................................. 205 Chapter 11: Alternative Formats: A Review of the Literature................................................ 223 Section 4: Summary Chapter 12: Where Are We and Where Could We Go Next?................................................. 233 Section 1: Introduction Chapter 1: Comparability and Test Variations Phoebe C. Winter Pacific Metrics In 2006, a consortium of state departments of education, led by the North Carolina Department of Public Instruction and the Council of Chief State School Officers, was awarded a grant from the U.S. Department of Education1 to investigate methods of determining comparability of variations of states’ assessments used to meet the requirements of the No Child Left Behind Act of 2001 (NCLB). NCLB peer review guidelines includes a requirement that states using variations of their assessments based on grade level academic achievement standards must provide evidence of comparability of proficiency from these variations to proficiency on the general assessment2 (U.S. Department of Education, July 2007, revised January 2009). The consortium of states focused on two types of K–12 achievement test variations: computer-based tests that are built to mirror the state’s paper-based tests and test formats that are designed to be more accessible to specific student populations than the general test is. In addition, the studies conducted included a study of the comparability of an alternate assessment based on modified achievement standards; the comparability issues related to those assessments are different from the ones discussed here. Researchers are beginning to think and write about comparability in a slightly different way than they have traditionally. In the recent past, comparability of test scores has been thought of mostly in terms of scale scores. Here, comparability is discussed in terms of the score that is being interpreted, which may be a scale score but may also be a less fine-grained score such as an achievement level score. Comparability has been thought of in terms of linking and, in state testing programs, most often in terms of equating. Here, linking scores across test forms is discussed, but the issues are expanded to include the comparability of achievement level scores from tests of very different formats. The issues include, and in the case of alternative formats go beyond, those related to what is referred to as “extreme linking” (Dorans, Pommerich, and Holland, 2007, p. 356; Middleton and Dorans, May, 2009, draft). The details of what is meant by score comparability are still being defined as the measurement community conducts research and explores theoretical bases for comparability. The language used to discuss comparability is evolving as well. For example, Dorans and Walker (2007) differentiate between “interchangeability” and “comparability,” reserving the first term for scores that have been equated, and Kolen and Brennan (2004) have used “comparability” as an outcome of linking (see, for example, p. 3). Standards for Educational and Psychological 1 This work was supported by a Grant for Enhanced Assessment Instruments awarded by the U.S. Department of Education. 2 We use general assessment or test to refer to the usual format of the test, typically administered as a paper-and- pencil test. 1 Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999) uses “comparable” when discussing whether scores can be considered interchangeable (p. 57). Because the research described here is concerned primarily with the use of the scores, “interchangeable” is used to mean that a score can be used in the same way, with the same construct-based inferences, as another score. In one sense, different test formats meant to measure the same construct can be seen as alternate forms. For a computer-based test, the idea that it is an alternate form of a paper-based test is not much of a stretch from what measurement specialists are used to thinking about. To think of a portfolio-based version of a paper-and-pencil test as an alternate form of it is more of a stretch. Alternatively, test variations (other than computer-based versions of the paper-and-pencil test) can be thought of as accommodations, but that raises the question of whether accommodated tests can be thought of as alternate forms of the general, non-accommodated test. It can be seen, then, that the idea of comparability is just a part of larger issues of validity, the question of standardization and what part of the test, from administration to inference, is standardized, and how valid inferences about student achievement can be made across a large, diverse