ED384655.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

ED384655.Pdf DOCUMENT RESUME ED 384 655 TM 023 948 AUTHOR Mislevy, Robert J. TITLE A Framework for Studying Differences between Multiple-Choice and Free-Response Test Items. INSTITUTION Educational Testing Service, Princeton, N.J. REPORT NO ETS-RR-91-36 PUB DATE May 91 NOTE 59p. PUB TYPE Reports Evaluative/Feasibility (142) EDRS PRICE MF01/PC03 Plus Postage. DESCRIPTORS Cognitive Psychology; *Competence; Elementary Secondary Education; *Inferences; *Multiple Choice Tests; *Networks; Test Construction; Test Reliability; *Test Theory; Test Validity IDENTIFIERS *Free Response Test Items ABSTRACT This paper lays out a framework for comparing the qualities and the quantities of information about student competence provided by multiple-choice and free-response test items. After discussing the origins of multiple-choice testing and recent influences for change, the paper outlines an "inference network" approach to test theory, in which students are characterized in terms of levels of understanding of key concepts in a learning area. It then describes how to build inference networks to address questions concerning the information about various criteria conveyed by alternative response formats. The types of questions that can be addressed in this framework include those that can be studied within the framework of standard test theory. Moreover, questions can be asked about generalized kinds of reliability and validity for inferences cast in terms of recent developments in cognitive psychology. (Contains 46 references, 4 tables, and 16 figures.) (Author) *********************************************************************** * Reproductions supplied by EDRS are the best that can be made from the original document. *********************************************************************** RR-91-36 U ti DEPARTMENT OF EDUCATION R °ace of Educttonai Research and improvement -PERMISSION TO REPRODUCE THIS EDUCATIONAL RESOURCES INFORMATION MATERIAL HAS BEEN GRANTED BY CENTER (ERIC) hr$ document has been reproduced as . ,e E wowed from the person or organaatron ongrnstmg .1 1` Minor changes have been made to improve S reproduction polity Pc:ants &wow a opmons Staled rn Mat docu ment do not necessanty represent oft.osi 10 THE EDUCATIONAL RESOURCES E OERI posnion or policy INFORMATION CENTER (ERIC) A R CR A FRAMEWORK FOR STUDYING DIFFERENCES BETWEEN HE MULTIPLE-CHOICE AND FREE-RESPONSE TEST ITEMS 0 R Robert J. Mislevy T BEST COPY AVAILABLE Educational Testing Service Princeton, New Jersey 00 May 1991 A Framework for StudyingDifferences Between Multiple-Choice and Free-Response TestItems Robert J.1Vlislevy Educational Testing Service April, 1991 To appear in R.E. Bennett & W.C.Ward (Eds.), Construction vs. Choice in Cognitive Measurement, Hillsdale, NJ: Erlbaum. Thework was supported by the Program Research Planning Council of Educational Testing Service.Figures 2 and 3 were reproduced with the kind permission of Dr. Steen Andreassen. 3 Copyright 0 1991. Educational Testing Service. All rights reserved. 4 A Framework for Studying Differences Between Multiple-Choice and Free-Response Test Items Abstract This paper lays out a framework for comparing the qualities and the quantities of information about student competence provided by multiple-choice and free-responsetest items. After discussing the origins of multiple-choice testing and recent influences for change, the paper outlines an "inference network" approach to test theory, in which students are characterized in terms of levels of understanding of key concepts ina learning area. It then describes how to build inference networks to address questions concerning the information about various criteria conveyed by alternativeresponse formats. The types of questions that can be addressed in this framework include those thatcan be studied within the framework of standard test theory. Moreover, questionscan be asked about generalized kinds of reliability and validity for inferences cast in terms ofrecent developments in cognitive psychology. A Framework for Studying Differences Page 1 Introduction Ever since Robert M. Yerkes tested amillion World War I recruits with his Army Alpha Intelligence Test, multiple-choiceitems have dominated educational selection, placement, and assessment applications.Occasional criticism has marked their reign,from observers including no less than BaneshHoffman, Ralph Nader, and (Educational Testing Service's own!) Norman Frederiksen. But thecharacter of today's debates strikes at the very heart of the enterprise:The view of human ability that spawnedmultiple-choice tests no longer holds universal currency amongpsychologists and educators. The ascendent view originates from a perspective moreattuned to instruction than to selection or prediction. Learners increase their competence notsimply by accumulating new facts and skills at rates determined by relatively immutable"aptitudes," but by reconfiguring knowledge structures, by automating proceduresand chunking information to reduce memory loads, and by developingstrategies and models that tell them when and howfacts and skills are relevant. Tests can be described as mere tools to gatherinformation, in order to guide educational decisions. But an educational decision-makingframework cannot be conceived except around a view of humanability, which suggests educational options,determines what information is relevant, and specifies how animplementation is to be evaluated. The pertinent questions about multiple-choice tests now arewhether, when, and how these tools, developed and validated within the oldparadigm, can serve useful roles within the new paradigm. This paper discusses an analytical framework toevaluate the contribution of different types of test items. It encompasses viewsof ability from the new and the old paradigms. When applied with the old, it expressesthe same questions that traditional test theoretic investigations have asked. Whenapplied with the new, it provides machinery to investigate analogous questionsaboutefficiency and reliability, for example. We suggest A Framework for Studying Differences Page 2 how "inference networks" can be used to model the informationfrom various types of observations, from traditional multiple-choiceitems to extended performancesor portfolios, in terms of skills, strategies, and states of understanding.An application in the setting of medical diagnosis introduces the ideas, andan example based on Robert Siegler's (1981) balance beam tasks illustratestheir application ina simple educational setting. This analytic framework does not deal directly with an important issuein the debate over construction versus choice, namely, the feedback effects on instruction fromthe content and modality of testing (see N. Frederiksen, 1984); we focus primarilyon what is learned about an individual from observations. The central role offeedback in the "why's" and "when's" of construction versus choice, however, is addressed inthe background section. Dackground The Origins of Multiple-ChoiceTests The initial enthusiasm that greeted multiple-choice testingcan be attributed to its neat conformance to both the psychology and the economics of thepost WorldWar I period. The psychology was that of the IQ. To the degree thata test was reliable, ranking students in accordance with overallproficiency was thoughtto reflect the cardinal characteristic of human intelligence. The economic character of thedecisions that shaped the evolution of testing was nearly universal in educationat the beginning of thiscentury, and it dominates practice yet today: Educators were confrontedwith selection or placement decisions for large numbers of students, and resources limitedthe information they could gather about each student, constrained the number of optionsthey could offer, and precluded much tailoring of programs to individual studentsonce a decision was made (Glaser, 1981). Exposing a diverse group of students to a uniform educational.treatment typically produces a distribution of outcomes, as measured by a testscore at the end of the program A Framework for Studying Differences Page 3 (Bloom, 1976). Decision-makers sought to identifywhich students would be likely to succeed in a limited number of programs. Anindividual's degree of success depends on how his or her unique skills, knowledge, and interestsmatch up with the equally multifaceted requirements of a program. The Army Alphademonstrated that at costs substantially lower than personal interviews or performancesamples, responses to multiple-choice test items could provide information about certain aspectsof this matchup. What is necessary in this approach is that each item tap someof the skills required for success. Even though a single itemmight require only a few of the relevant skills and offer little information in its own right, a tendency to provide correct answers over alarge number of items supports some degree of prediction of successalong this dimension (Green, 1978). A typical multiple-choice test would be used toguide infrequent, broadly cast, decisions, such as college admittance, courseplacement, or military occupational assignment. stasis Assumptions The multiple-choice educational testing enterprise evolved under twotacit assumptions of stasis: 1. A characteristic of examinees that is stable over time is beingmeasured. In the Army Alpha and many educational tests, that characteristic isposited to be "intelligence," calibrated in units of IQ. "Intelligence" gave way to "aptitude."This assumption is consonant with infrequent decision-making; the measure
Recommended publications
  • Randy E. Bennett Matthias Von Davier Editors the Methodological
    Methodology of Educational Measurement and Assessment Randy E. Bennett Matthias von Davier Editors Advancing Human Assessment The Methodological, Psychological and Policy Contributions of ETS Methodology of Educational Measurement and Assessment Series editors Bernard Veldkamp, Research Center for Examinations and Certification (RCEC), University of Twente, Enschede, The Netherlands Matthias von Davier, National Board of Medical Examiners (NBME), Philadelphia, USA1 1This work was conducted while M. von Davier was employed with Educational Testing Service. This book series collates key contributions to a fast-developing field of education research. It is an international forum for theoretical and empirical studies exploring new and existing methods of collecting, analyzing, and reporting data from educational measurements and assessments. Covering a high-profile topic from multiple viewpoints, it aims to foster a broader understanding of fresh developments as innovative software tools and new concepts such as competency models and skills diagnosis continue to gain traction in educational institutions around the world. Methodology of Educational Measurement and Assessment offers readers reliable critical evaluations, reviews and comparisons of existing methodologies alongside authoritative analysis and commentary on new and emerging approaches. It will showcase empirical research on applications, examine issues such as reliability, validity, and comparability, and help keep readers up to speed on developments in statistical modeling approaches. The fully peer-reviewed publications in the series cover measurement and assessment at all levels of education and feature work by academics and education professionals from around the world. Providing an authoritative central clearing-house for research in a core sector in education, the series forms a major contribution to the international literature.
    [Show full text]
  • Creating Coupled-Multiple Response Test Items in Physics and Engineering for Use in Adaptive Formative Assessments
    Creating coupled-multiple response test items in physics and engineering for use in adaptive formative assessments Laura R´ıos∗z, Benjamin Lutzy, Eileen Rossmany, Christina Yee∗, Dominic Tragesery, Maggie Nevrlyy, Megan Philipsy, Brian Selfy ∗Physics Department California Polytechnic State University San Luis Obispo, CA, USA yDepartment of Mechanical Engineering California Polytechnic State University San Luis Obispo, CA, USA zCorresponding author: [email protected] Abstract—This Research Work-in-Progress paper presents technologies [5], [6], [7]. In this work-in-progress, we describe ongoing work in engineering and physics courses to create our research efforts which leverage existing online learning online formative assessments. Mechanics courses are full of non- infrastructure (Concept Warehouse) [8]. The Concept Ware- intuitive, conceptually challenging principles that are difficult to understand. In order to help students with conceptual growth, house is an online instructional and assessment tool that helps particularly if we want to develop individualized online learning students develop a deeper understanding of relevant concepts modules, we first must determine student’s prior understanding. across a range of STEM courses and topics. To do this, we are using coupled-multiple response (CMR) ques- We begin by briefly reviewing student difficulties in statics tions in introductory physics (a mechanics course), engineering and dynamics, in addition to our motivation for developing statics, and engineering dynamics. CMR tests are assessments that use a nuanced rubric to examine underlying reasoning specifically coupled-multiple response (CMR) test items to elements students may have for decisions on a multiple-choice test tailor online learning tools. CMR items leverage a combination and, as such, are uniquely situated to pinpoint specific student of student answers and explanations to identify specific ways alternate conceptions.
    [Show full text]
  • AP® Art History Practice Exam
    Sample Responses from the AP® Art History Practice Exam Sample Questions Scoring Guidelines Student Responses Commentaries on the Responses Effective Fall 2015 AP Art History Practice Exam Sample Responses About the College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the College Board was created to expand access to higher education. Today, the membership association is made up of over 6,000 of the world’s leading educational institutions and is dedicated to promoting excellence and equity in education. Each year, the College Board helps more than seven million students prepare for a successful transition to college through programs and services in college readiness and college success — including the SAT® and the Advanced Placement Program®. The organization also serves the education community through research and advocacy on behalf of students, educators, and schools. For further information, visit www.collegeboard.org. AP® Equity and Access Policy The College Board strongly encourages educators to make equitable access a guiding principle for their AP® programs by giving all willing and academically prepared students the opportunity to participate in AP. We encourage the elimination of barriers that restrict access to AP for students from ethnic, racial, and socioeconomic groups that have been traditionally underserved. Schools should make every effort to ensure their AP classes reflect the diversity of their student population. The College Board also believes that all students should have access to academically challenging course work before they enroll in AP classes, which can prepare them for AP success.
    [Show full text]
  • The Effect of Using Different Weights for Multiple-Choice and Free- Response Item Sections Amy Hendrickson Brian Patterson Gerald Melican
    The Effect of Using Different Weights for Multiple-Choice and Free- Response Item Sections Amy Hendrickson Brian Patterson Gerald Melican The College Board National Council for Measurement in Education March 27th, 2008 New York Combining Scores to Form Composites • Many high-stakes tests include both multiple-choice (MC) and free response (FR) item types • Linearly combining the item type scores is equivalent to deciding how much each component will contribute to the composite. • Use of different weights potentially impacts the reliability and/or validity of the scores. 2 Advanced Placement Program® (AP®) Exams • 34 of 37 Advanced Placement Program® (AP®) Exams contain both MC and FR items. • Weights for AP composite scores are set by a test development committee. • These weights range from 0.33 to 0.60 for the FR section. • These are translated into absolute weights which are the multiplicative constants that are applied directly to the item scores • Previous research exists concerning the effect of different weighting schemes on the reliability of AP exam scores but not on the validity of the scores. 3 Weighting and Reliability and Validity • Construct/Face validity perspective • Include both MC and FR items because they measure different and important constructs. • De-emphasizing the FR section may disregard the intention of the test developers and policy makers who believe that, if anything, the FR section should have the higher weight to meet face validity. • Psychometric perspective • FR are generally less reliable than MC, thus, it is best to give proportionally higher weight to the (more reliable) MC section. • The decision of how to combine item-type scores represents a trade-off between the test measuring what it is meant and expected to measure and the test measuring consistently (Walker, 2005).
    [Show full text]
  • The Beliefs of Advanced Placement Teachers Regarding Equity
    THE BELIEFS OF ADVANCED PLACEMENT TEACHERS REGARDING EQUITY AND ACCESS TO ADVANCED PLACEMENT COURSES: A MIXED-METHODS STUDY by Mirynne O’Connor Igualada A Dissertation Submitted to the Faculty of The College of Education in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Florida Atlantic University Boca Raton, Florida December 2015 Copyright 2015 by Mirynne O’Connor Igualada ii ACKNOWLEDGEMENTS I cannot express enough gratitude to the many people who provided continued support and encouragement throughout this journey. Thank you to my mother, Maureen, as you have been an endless source of support and encouragement. You always have believed that I could achieve this degree, while also helping me to stay somewhat sane throughout the process. You provided my theoretical framework; had I not been raised in your household and under your spiritual guidance, I would not have felt the need to examine issues of inequity. To my husband, Eric, thank you for your love throughout this process. You may not have always wanted to hear about my research, but you always have been willing to support my hopes and dreams without question. I would not be where I am today without you. I love you. Thank you to my Angel Baby and my Rainbow Baby, Emalee Jane. I am not sure I would have found the strength and stamina to complete this degree had I not felt the need to make you both proud of my accomplishments. I would like to thank my dissertation chair, Dr. Dilys Schoorman, who has been by my side through every major event academically, personally, and professionally in my adult life.
    [Show full text]
  • Finding Relationships Between Multiple-Choice Math Tests and Their Ts Em-Equivalent Constructed Responses Nayla Aad Chaoui Claremont Graduate University
    Claremont Colleges Scholarship @ Claremont CGU Theses & Dissertations CGU Student Scholarship 2011 Finding Relationships Between Multiple-Choice Math Tests and Their tS em-Equivalent Constructed Responses Nayla Aad Chaoui Claremont Graduate University Recommended Citation Chaoui, Nayla Aad, "Finding Relationships Between Multiple-Choice Math Tests and Their tS em-Equivalent Constructed Responses" (2011). CGU Theses & Dissertations. Paper 21. http://scholarship.claremont.edu/cgu_etd/21 DOI: 10.5642/cguetd/21 This Open Access Dissertation is brought to you for free and open access by the CGU Student Scholarship at Scholarship @ Claremont. It has been accepted for inclusion in CGU Theses & Dissertations by an authorized administrator of Scholarship @ Claremont. For more information, please contact [email protected]. Finding Relationships Between Multiple-Choice Math Tests And Their Stem-Equivalent Constructed Responses By Nayla Aad Chaoui A dissertation submitted to the Faculty of Claremont Graduate University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate Faculty of Education Claremont Graduate University 2011 APPROVAL OF THE DISSERTATION COMMITTEE We, the undersigned, certify that we have read, reviewed, and critiqued the dissertation of Nayla Aad Chaoui and do hereby approve it as adequate in scope and quality for meriting the degree of Doctor of Philosophy. ______________________________________________________________ Dr. Mary Poplin Chair School of Educational Studies ______________________________________________________________
    [Show full text]
  • AP COMPARATIVE STUDY GUIDE by Ethel Wood
    AP COMPARATIVE STUDY GUIDE by Ethel Wood Comparative government and politics provides an introduction to the wide, diverse world of governments and political practices that currently exist in modern times. Although the course focuses on specific countries, it also emphasizes an understanding of conceptual tools and methods that form a framework for comparing almost any governments that exist today. Additionally, it requires students to go beyond individual political systems to consider international forces that affect all people in the world, often in very different ways. Six countries form the core of the course: Great Britain, Russia, China, Mexico, Iran, and Nigeria. The countries are chosen to reflect regional variations, but more importantly, to illustrate how important concepts operate both similarly and differently in different types of political systems: "advanced" democracies, communist and post communist countries, and newly industrialized and less developed nations. This book includes review materials for all six countries. WHAT IS COMPARATIVE POLITICS? Most people understand that the term government is a reference to the leadership and institutions that make policy decisions for the country. However, what exactly is politics? Politics is basically all about power. Who has the power to make the decisions? How did they get the power? What challenges do leaders face from others &endash; both inside and outside the country's borders &endash; in keeping the power? So, as we look at different countries, we are not only concerned about the ins and outs of how the government works. We will also look at how power is gained, managed, challenged, and maintained. College-level courses in comparative government and politics vary, but they all cover topics that enable meaningful comparisons across countries.
    [Show full text]
  • Preference for Multiple Choice and Constructed Response Exams for Engineering Students with and Without Learning Difficulties
    Preference for Multiple Choice and Constructed Response Exams for Engineering Students with and without Learning Difficulties Panagiotis Photopoulos1a, Christos Tsonos2b, Ilias Stavrakas1c and Dimos Triantis1d 1Department of Electrical and Electronic Engineering, University of West Attica, Athens, Greece 2Department of Physics, University of Thessaly, Lamia, Greece Keywords: Multiple Choice, Constructed Response, Essay Exams, Learning Difficulties, Power Relations. Abstract: Problem solving is a fundamental part of engineering education. The aim of this study is to compare engineering students’ perceptions and attitudes towards problem-based multiple-choice and constructed response exams. Data were collected from 105 students, 18 of them reported to face some learning difficulty. All the students had an experience of four or more problem-based multiple-choice exams. Overall, students showed a preference towards multiple-choice exams although they did not consider them to be fairer, easier or less anxiety invoking. Students facing learning difficulties struggle with written exams independently of their format and their preferences towards the two examination formats are influenced by the specificity of their learning difficulties. The degree to which each exam format allows students to show what they have learned and be rewarded for partial knowledge, is also discussed. The replies to this question were influenced by students’ preference towards each examination format. 1 INTRODUCTION consuming process and a computer-based evaluation of the answers is still problematic (Ventouras et al. Public universities are experiencing budgetary cuts 2010). manifested in increased flexible employment, Assessment embodies power relations between overtime working and large classes (Parmenter et al. institutions, teachers and students (Tan, 2012; Holley 2009; Watts 2017).
    [Show full text]
  • AP United States History Practice Exam and Notes
    Effective Fall 2017 AP United States History Practice Exam and Notes This exam may not be posted on school or personal websites, nor electronically redistributed for any reason. This exam is provided by the College Board for AP Exam preparation. Teachers are permitted to download the materials and make copies to use with their students in a classroom setting only. To maintain the security of this exam, teachers should collect all materials after their administration and keep them in a secure location. Further distribution of these materials outside of the secure College Board site disadvantages teachers who rely on uncirculated questions for classroom testing. Any additional distribution is in violation of the College Board’s copyright policies and may result in the termination of Practice Exam access for your school as well as the removal of access to other online services such as the AP Teacher Community and Online Score Reports. About the College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the College Board was created to expand access to higher education. Today, the membership association is made up of over 6,000 of the world’s leading educational institutions and is dedicated to promoting excellence and equity in education. Each year, the College Board helps more than seven million students prepare for a successful transition to college through programs and services in college readiness and college success—including the SAT® and the Advanced Placement Program®. The organization also serves the education community through research and advocacy on behalf of students, educators, and schools.
    [Show full text]
  • Technology and Testing
    Technology and Testing From early answer sheets filled in with number 2 pencils, to tests administered by mainframe computers, to assessments wholly constructed by computers, it is clear that technology is changing the field of educational and psychological measurement. The numerous and rapid advances have immediate impact on test creators, assessment professionals, and those who implement and analyze assessments. This comprehensive new volume brings together lead- ing experts on the issues posed by technological applications in testing, with chapters on game-based assessment, testing with simulations, video assessment, computerized test devel- opment, large-scale test delivery, model choice, validity, and error issues. Including an overview of existing literature and groundbreaking research, each chapter considers the technological, practical, and ethical considerations of this rapidly changing area. Ideal for researchers and professionals in testing and assessment, Technology and Testing provides a critical and in-depth look at one of the most pressing topics in educational testing today. Fritz Drasgow is Professor of Psychology and Dean of the School of Labor and Employment Relations at the University of Illinois at Urbana-Champaign, USA. The NCME Applications of Educational Measurement and Assessment Book Series Editorial Board: Michael J. Kolen, The University of Iowa, Editor Robert L. Brennan, The University of Iowa Wayne Camara, ACT Edward H. Haertel, Stanford University Suzanne Lane, University of Pittsburgh Rebecca Zwick, Educational Testing Service Technology and Testing: Improving Educational and Psychological Measurement Edited by Fritz Drasgow Forthcoming: Meeting the Challenges to Measurement in an Era of Accountability Edited by Henry Braun Testing the Professions: Credentialing Policies and Practice Edited by Susan Davis-Becker and Chad W.
    [Show full text]