ED384655.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
DOCUMENT RESUME ED 384 655 TM 023 948 AUTHOR Mislevy, Robert J. TITLE A Framework for Studying Differences between Multiple-Choice and Free-Response Test Items. INSTITUTION Educational Testing Service, Princeton, N.J. REPORT NO ETS-RR-91-36 PUB DATE May 91 NOTE 59p. PUB TYPE Reports Evaluative/Feasibility (142) EDRS PRICE MF01/PC03 Plus Postage. DESCRIPTORS Cognitive Psychology; *Competence; Elementary Secondary Education; *Inferences; *Multiple Choice Tests; *Networks; Test Construction; Test Reliability; *Test Theory; Test Validity IDENTIFIERS *Free Response Test Items ABSTRACT This paper lays out a framework for comparing the qualities and the quantities of information about student competence provided by multiple-choice and free-response test items. After discussing the origins of multiple-choice testing and recent influences for change, the paper outlines an "inference network" approach to test theory, in which students are characterized in terms of levels of understanding of key concepts in a learning area. It then describes how to build inference networks to address questions concerning the information about various criteria conveyed by alternative response formats. The types of questions that can be addressed in this framework include those that can be studied within the framework of standard test theory. Moreover, questions can be asked about generalized kinds of reliability and validity for inferences cast in terms of recent developments in cognitive psychology. (Contains 46 references, 4 tables, and 16 figures.) (Author) *********************************************************************** * Reproductions supplied by EDRS are the best that can be made from the original document. *********************************************************************** RR-91-36 U ti DEPARTMENT OF EDUCATION R °ace of Educttonai Research and improvement -PERMISSION TO REPRODUCE THIS EDUCATIONAL RESOURCES INFORMATION MATERIAL HAS BEEN GRANTED BY CENTER (ERIC) hr$ document has been reproduced as . ,e E wowed from the person or organaatron ongrnstmg .1 1` Minor changes have been made to improve S reproduction polity Pc:ants &wow a opmons Staled rn Mat docu ment do not necessanty represent oft.osi 10 THE EDUCATIONAL RESOURCES E OERI posnion or policy INFORMATION CENTER (ERIC) A R CR A FRAMEWORK FOR STUDYING DIFFERENCES BETWEEN HE MULTIPLE-CHOICE AND FREE-RESPONSE TEST ITEMS 0 R Robert J. Mislevy T BEST COPY AVAILABLE Educational Testing Service Princeton, New Jersey 00 May 1991 A Framework for StudyingDifferences Between Multiple-Choice and Free-Response TestItems Robert J.1Vlislevy Educational Testing Service April, 1991 To appear in R.E. Bennett & W.C.Ward (Eds.), Construction vs. Choice in Cognitive Measurement, Hillsdale, NJ: Erlbaum. Thework was supported by the Program Research Planning Council of Educational Testing Service.Figures 2 and 3 were reproduced with the kind permission of Dr. Steen Andreassen. 3 Copyright 0 1991. Educational Testing Service. All rights reserved. 4 A Framework for Studying Differences Between Multiple-Choice and Free-Response Test Items Abstract This paper lays out a framework for comparing the qualities and the quantities of information about student competence provided by multiple-choice and free-responsetest items. After discussing the origins of multiple-choice testing and recent influences for change, the paper outlines an "inference network" approach to test theory, in which students are characterized in terms of levels of understanding of key concepts ina learning area. It then describes how to build inference networks to address questions concerning the information about various criteria conveyed by alternativeresponse formats. The types of questions that can be addressed in this framework include those thatcan be studied within the framework of standard test theory. Moreover, questionscan be asked about generalized kinds of reliability and validity for inferences cast in terms ofrecent developments in cognitive psychology. A Framework for Studying Differences Page 1 Introduction Ever since Robert M. Yerkes tested amillion World War I recruits with his Army Alpha Intelligence Test, multiple-choiceitems have dominated educational selection, placement, and assessment applications.Occasional criticism has marked their reign,from observers including no less than BaneshHoffman, Ralph Nader, and (Educational Testing Service's own!) Norman Frederiksen. But thecharacter of today's debates strikes at the very heart of the enterprise:The view of human ability that spawnedmultiple-choice tests no longer holds universal currency amongpsychologists and educators. The ascendent view originates from a perspective moreattuned to instruction than to selection or prediction. Learners increase their competence notsimply by accumulating new facts and skills at rates determined by relatively immutable"aptitudes," but by reconfiguring knowledge structures, by automating proceduresand chunking information to reduce memory loads, and by developingstrategies and models that tell them when and howfacts and skills are relevant. Tests can be described as mere tools to gatherinformation, in order to guide educational decisions. But an educational decision-makingframework cannot be conceived except around a view of humanability, which suggests educational options,determines what information is relevant, and specifies how animplementation is to be evaluated. The pertinent questions about multiple-choice tests now arewhether, when, and how these tools, developed and validated within the oldparadigm, can serve useful roles within the new paradigm. This paper discusses an analytical framework toevaluate the contribution of different types of test items. It encompasses viewsof ability from the new and the old paradigms. When applied with the old, it expressesthe same questions that traditional test theoretic investigations have asked. Whenapplied with the new, it provides machinery to investigate analogous questionsaboutefficiency and reliability, for example. We suggest A Framework for Studying Differences Page 2 how "inference networks" can be used to model the informationfrom various types of observations, from traditional multiple-choiceitems to extended performancesor portfolios, in terms of skills, strategies, and states of understanding.An application in the setting of medical diagnosis introduces the ideas, andan example based on Robert Siegler's (1981) balance beam tasks illustratestheir application ina simple educational setting. This analytic framework does not deal directly with an important issuein the debate over construction versus choice, namely, the feedback effects on instruction fromthe content and modality of testing (see N. Frederiksen, 1984); we focus primarilyon what is learned about an individual from observations. The central role offeedback in the "why's" and "when's" of construction versus choice, however, is addressed inthe background section. Dackground The Origins of Multiple-ChoiceTests The initial enthusiasm that greeted multiple-choice testingcan be attributed to its neat conformance to both the psychology and the economics of thepost WorldWar I period. The psychology was that of the IQ. To the degree thata test was reliable, ranking students in accordance with overallproficiency was thoughtto reflect the cardinal characteristic of human intelligence. The economic character of thedecisions that shaped the evolution of testing was nearly universal in educationat the beginning of thiscentury, and it dominates practice yet today: Educators were confrontedwith selection or placement decisions for large numbers of students, and resources limitedthe information they could gather about each student, constrained the number of optionsthey could offer, and precluded much tailoring of programs to individual studentsonce a decision was made (Glaser, 1981). Exposing a diverse group of students to a uniform educational.treatment typically produces a distribution of outcomes, as measured by a testscore at the end of the program A Framework for Studying Differences Page 3 (Bloom, 1976). Decision-makers sought to identifywhich students would be likely to succeed in a limited number of programs. Anindividual's degree of success depends on how his or her unique skills, knowledge, and interestsmatch up with the equally multifaceted requirements of a program. The Army Alphademonstrated that at costs substantially lower than personal interviews or performancesamples, responses to multiple-choice test items could provide information about certain aspectsof this matchup. What is necessary in this approach is that each item tap someof the skills required for success. Even though a single itemmight require only a few of the relevant skills and offer little information in its own right, a tendency to provide correct answers over alarge number of items supports some degree of prediction of successalong this dimension (Green, 1978). A typical multiple-choice test would be used toguide infrequent, broadly cast, decisions, such as college admittance, courseplacement, or military occupational assignment. stasis Assumptions The multiple-choice educational testing enterprise evolved under twotacit assumptions of stasis: 1. A characteristic of examinees that is stable over time is beingmeasured. In the Army Alpha and many educational tests, that characteristic isposited to be "intelligence," calibrated in units of IQ. "Intelligence" gave way to "aptitude."This assumption is consonant with infrequent decision-making; the measure