Using Validity Arguments to Evaluate the Technical Quality of Local Assessment Systems

Using Validity Arguments to Evaluate the Technical Quality of Local Assessment Systems Chad M. Gotch Washington State University Marianne Perie National Center for the Improvement of Educational Assessment Paper presented at the annual meeting of the American Educational Research Association conference. April, 2012, Vancouver, BC, Canada. Local assessment system validity 2 Abstract Despite the centrality of validity concerns to the practice of educational assessment, little practical guidance exists for creating and evaluating validity arguments. This paper proposes a validity argument framework for evaluating the technical quality of locally-developed high school end-of-course examinations that are intended to supplement or supplant state-level examinations. The framework seeks to balance concerns for psychometric rigor with feasibility of implementation. Essential elements of contemporary validity theory have been adopted and translated into accessible terminology and a practical validation process. The State of Pennsylvania provides an exemplar assessment context for the implementation of the validity argument framework. Examples of required validity evidence and claims to support the locally- developed assessment are provided. Local assessment system validity 3 Using Validity Arguments to Evaluate the Technical Quality of Local Assessment Systems Validity is the cornerstone upon which public confidence in high-stakes testing programs is built. A sound validity evaluation is a necessary component of any kind of high-stakes assessment use in the schools; however, such a framework may often be lacking in assessments developed at the local level (e.g., school district; Sperling & Kulikowich, 2009). This paper will illustrate a practical application of forming and evaluating a validity argument as a basis for evaluating locally-developed assessments, and highlight the tensions among current validity theory and practical implementation issues, especially at the local level. Several states have encouraged or even mandated that local school districts create their own assessment systems, and use them in place of the state-developed test. These local assessment systems have served a range of purposes, but often have been tied to student graduation determinations, clearly a high-stakes use. Given such purposes, it reasons that state assessment leaders and policy makers are concerned local systems have appropriate degrees of technical quality. Typically, locally-developed tests need to undergo a technical quality evaluation to be certain that they are appropriately aligned with the state-mandated content and that the interpretation of the performance levels remains the same as the state-level test (if applicable). Evaluating technical quality, however, can be a herculean task for a school district that lacks the benefit of a fleet of personnel with advanced educational measurement training. We argue that evaluations of technical quality should be couched within a validity evaluation framework. While several quality academic writings on validity theory exist (e.g., American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999; Kane, 2006; Lissitz, 2009), few resources exposit detailed guidance for creating and evaluating a validity argument. Furthermore, a similar dearth of Local assessment system validity 4 resources leaves little guidance to state departments of education charged with overseeing a validity evaluation process across a confederacy of school districts. In this paper we discuss tangible issues regarding the implementation of a local assessment evaluation system, and provide real-world insight for wrestling with these issues. The purpose of this paper is to explore the demonstration of technical quality in a local assessment system by using a validity argument framework. Previous efforts have introduced validity argumentation to alternative assessment systems for students with disabilities (Marion & Pellegrino, 2006; Marion & Perie, 2009). This presentation goes a step further by addressing the use of validity arguments in local assessment systems that are intended to supplement or supplant state systems. Validity in the context of local assessment systems Validity is defined as the “degree to which evidence and theory support the interpretations of the test scores entailed by proposed uses of the test” (AERA, NCME, & APA, 1999, p.9). Messick (1989) discussed gathering validity evidence to evaluate intended test score interpretations, and categorized these forms into five categories—test content, associations with other variables, test structure, response processes, and consequences of testing. The forms of evidence advanced by Messick, have become widely adopted across the educational measurement field and are reflected in Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). Classification of these forms of evidence and the description of how they contribute to an overarching concern for accurate representation of a construct marked an important step in the evolution of validity theory, but little guidance was provided on how to assemble disassociated acts of gathering evidence into a coherent whole. Local assessment system validity 5 The argument-based framework for validity (e.g., Kane, 2002, 2006) addressed this need by framing validity in terms of a chain of logical statements. Toulmin, Rieke, and Janik’s (1979) model of reasoning conceptualizes the formation of an argument as the piecing together of a sequence of claims and reasons that support one’s position. The six elements of a single logical link in the chain of arguments are claim, grounds, warrants, backing, modal qualifiers, and possible rebuttals. The logic process moves from grounds (i.e., data or observations) to claims, with the other elements providing context and delimitations. Single arguments can be assembled into a larger overarching argument, as the claim from one argument becomes the grounds for another argument. Collectively, this chain provides contextualized explanations of test scores. Applied to the evaluation of a local assessment system, we must consider the inferences and explanations we intend to make. What are the intended claims? Such considerations are shaped by the specific legal code driving the local assessment context. At the heart of all considerations, however, is the desire to make some claim about the quality and accuracy of scores (or other forms of results) coming from the assessments. Interpretive arguments in the model of reasoning outlined above allow for judgment of the extent to which the local system maintains appropriate levels of quality and accuracy. The case of Pennsylvania Pennsylvania is a state with a tradition of strong control residing in local school districts. This tradition was maintained when state code was amended in 2010 to establish a new set of high school graduation requirements. While the state is developing new end-of-course assessments to test students’ competencies in core curricular subjects, the Keystone exams, a provision was written into law that allowed students to demonstrate competency on locally- Local assessment system validity 6 developed and independently-validated assessments. This provision was an extension of previous policy that allowed local assessment systems for non-state-mandated content areas. Now, local municipalities will be able to implement their own assessments in content areas considered core for graduation purposes. The challenge of this new provision will be to ensure that inferences of student performance on local assessments are comparable to inferences drawn from the Keystone exams. Investigation of the current model of local assessments revealed substantial issues surrounding standards of proficiency. In order to render fair decisions on students and instill public confidence in the state’s education system, it is imperative to establish an evaluation process for the local assessments. To achieve such an end, the state established a Local Assessment Validation Advisory Committee (LAVAC) comprised of representatives from the Department of Education, State Board of Education, School Boards Association, and local district leaders. The charge of the committee is to develop the criteria for the local assessment validation process and for the selection of approved evaluators. This context provided an opportune chance to explore the development and evaluation of validity arguments in a practical setting. Development of an argument-based validity evaluation Artifacts from previous ventures into local assessment systems by other states, the meetings of a state local assessment validation advisory committee, and cases of developing validity arguments for other assessments (e.g., TOEFL, Chapelle, Enright, & Jamieson, 2010) served as primary sources of reference. Establishing criteria for technical quality in a local assessment system required constant consideration of feasibility while adhering to the mandates of a sound validity argument for judging student proficiencies. Given limited resources common to local school districts, framing a validity argument resource for local assessment systems Local assessment system validity 7 focused on front-end design considerations, placed in a comprehensible context that demonstrates how sound design leads to defensible conclusions. To move from abstract pontificating about score inferences to concrete guidance in assembling a validity

Load more