Towards Adaptive CALL Carol A
Total Page:16
File Type:pdf, Size:1020Kb
TToowwaarrddss AAddaappttiivvee CCAALLLL Natural Language Processing for Diagnostic Language Assessment Iowa State University Edited by Carol A. Chapelle YooRee Chung Jing Xu 2008 Selected Papers from the Fifth Annual Conference on Technology for Second Language Learning ii ACKNOWLEGEMENTS The papers in this volume were presented at a conference held at Iowa State University on September 21 and 22 of 2007. We are grateful for a grant from the TOEFL® Grants and Awards Program at Educational Testing Service for making this conference possible. We thank the invited speakers for their enthusiastic participation: Robert Mislevy, Eunice E. Jang, Yong-Won Lee, Nathan Carr, Xiaoming Xi, and Mat Schulze. Although Trude Heift was unable to attend, her pioneering work in this area was cited many times. We also thank the Departments of English and World Languages and Culture as well as the Program in Linguistics at Iowa State University for their financial support. Copyright for abstracts and papers written for the Fifth Annual Conference on Technology for Second Language Learning (TSLL) is retained by the individual author/s, who should be contacted for permission by those wishing to use the materials for purposes other than those in accordance with fair use provisions of U.S. Copyright Law. 2008 TESL/Applied Linguistics Iowa State University iii TABLE OF CONTENTS Introduction 1 Towards Adaptive CALL Carol A. Chapelle, Yoo-Ree Chung, and Jing Xu Part I. Adaptivity 7 A taxonomy of adaptive testing 9 Robert Mislevy, Carol A. Chapelle, Yoo-Ree Chung, and Jing Xu Using diagnostic information to adapt traditional textbook-based instruction 25 Joan Jamieson, Maja Grgurovic, and Tony Becker Towards cognitive response theory in diagnostic language assessment 40 Quan Zhang Part II. NLP Analysis in Language Assessment 63 Automated diagnostic writing tests: Why? How? 65 Elena Cotos and Nick Pendar Decisions about automated scoring: What they mean for our constructs 82 Nathan Carr What and how much evidence do we need? Critical considerations for using 102 automated speech scoring systems Xiaoming Xi Part III. Learner Data, Diagnosis, and Language Acquisition 115 A framework for cognitive diagnostic assessment 117 Eunice Eunhee Jang Study on the analysis of learner data for the effectiveness of an ESL CALL 132 program Jinhee Choo and Doe-Hyung Kim Modeling SLA processes using NLP 149 Mathias Schulze Lexical acquisition, awareness, and self-assessment through 167 computer-mediated interaction: The effects of modality and dyad type Melissa Baralt Part IV. Authenticity in Language Assessment 195 Minimal pairs in spoken corpora: Implications for pronunciation assessment 197 and teaching John Levis and Viviana Cortes The development of a web-based Spanish listening exam 209 Cristina Pardo-Ballester About the Authors 229 Introduction | 1 Introduction TOWARDS ADAPTIVE CALL: Natural Language Processing for Diagnostic Language Assessment Carol A. Chapelle Yoo-Ree Chung Jing Xu Iowa State University Many advances in computer-assisted language learning (CALL) require researchers to draw upon technical knowledge about diagnostic assessment, student models, and natural language processing to design adaptive instruction. The fifth annual conference on Technology for Second Language Learning held at Iowa State University on September 21 and 22, 2007 brought together researchers and graduate students working to address issues in these areas. A day and a half of presentations, many of which are included in this volume, spanned the issues pertaining to development and evaluation of adaptive systems for second language learning. The overarching aim for the conference was to better understand the nature of adaptivity and how it can be achieved in real world applications that help language learners by assessing their language abilities and taking action based on the assessment. The papers in this volume are divided according to four themes that they develop. The first section includes three papers discussing adaptivity. The first one is based on the paper presented by Robert Mislevy, who framed the issue of adaptivity by describing the many ways that adaptivity can be constructed in assessments. Drawing on work in Frames of Discernment (Shafer, 1976) and Evidence-Centered Assessment Design (Mislevy, Steinberg, & Almond, 2003), he proposed a taxonomy which categorizes assessments according to claim status (fixed or adaptive), observations status (fixed or adaptive), and the controlling parties of claims and observations (examiner- or examinee- controlled). The paper in this volume illustrates how the combinations of options for adaptivity appear in existing language tests and hypothetical ones that might be developed in the future. In doing so, it provides the terms and concepts needed to expand professional knowledge about adaptivity in a way that clarifies existing practice and generates new possibilities. Joan Jamieson, Maja Grgurovic, and Tony Becker illustrate the procedure of developing and evaluating two diagnostic assessments (i.e., Readiness Check and Achievement Test) in order to support adaptivity in a commercial ESL textbook called NorthStar, intended to help ESL students prepare for TOEFL iBT . In this study, they investigate whether the diagnostic tests assist both the teacher and the students in preparing for the unit and in evaluating students’ learning achievement at the end of the unit. The participants’ Selected Papers from the Fifth Annual Conference on Technology for Second Language Learning 2 | Carol Chapelle, Yoo‐Ree Chung, and Jing Xu questionnaire responses suggest that pre- and post-unit diagnoses may support adaptive extension of classroom language instruction for individual students. The paper demonstrates some of the challenges of attempting to operationalize diagnosis in commercial materials. Quan Zhang investigates the potential of applying the Cognitive Response Theory in computer-based language assessment. He suggests that computerized cognitive testing (CCT) has several advantages over computer-adaptive testing in terms of the knowledge and skills assessed, the variables taken into consideration, the task format adopted, and the scoring method used. His study of approximately 200 examinees taking a CCT using jumbled word test items reveals that some cognitive variables reflected in test-taking behavior, which can be assessed in CCT but not in traditional computer adaptive testing, distinguish examinees’ levels of language proficiency. In addition, the author uses the latent factor approach to model a CCT examinee’s language ability based on data collected accumulatively from a college level English test. Central to more sophisticated adaptive systems are student models, which for language learners, need to model learners’ interlanguage or state of language ability. A student model, unlike a single test score, is capable of representing a learner’s detailed language knowledge based on evidence provided in their complex linguistic performance. However, if a system is to gather data to populate a student model representing language knowledge, it must be able recognize the relevant aspects of language in an examinee’s responses. The second group of papers reports on the use of natural language processing in the evaluation of ESL learners’ language performance. Elena Cotos and Nick Pendar explain that many computer-assisted language tests make inferences about learners’ L2 proficiency based on examinees’ selected responses. They argue that the use of natural language processing (NLP) for L2 writing assessment would improve the inferences that could be drawn about learners’ writing ability. They began by pointing out the advantages of constructed responses such as automatic evaluation, the provision of meaningful feedback for better learning, increased practicality and objectivity of assessment and describe what these tests look like, discussing their inherent characteristics, construct definitions, and types of test items. Finally, by reviewing several current Automated Essay Scoring Systems (AES) and approaches to natural language processing (NLP), they reveal the potentials of applying NLP techniques in automated diagnostic writing tests. Nathan Carr discusses the relationship between decisions about automated scoring criteria and refinement of constructs in operational tests. Among three general automated scoring approaches, Carr argues for the keyword matching for comprehension test items. He illustrates how the implementation of this approach in a web-based test affected the decisions about scoring criteria and how test constructs in turn had to be altered and modified in regard to seven aspects of scoring criteria developed by Carr, Pan, and Xi (2002). The author’s delineation of his ongoing development of a low-budget keyword matching program that runs in Microsoft Excel carries out the suggestion that Towards CALL Adaptivity: Natural Language Processing for Diagnostic Language Assessment Introduction | 3 purposefully selected automated scoring approaches are applicable to small-scale, low- or mid-stakes diagnostic assessment of language learners’ performance on target language skills and he suggests that this approach is worth exploring for well-funded, large-scale language tests. Xiaoming Xi applies an argument-based approach for validation of the internet-based TOEFL iBT Speaking Practice test, which uses automated scoring of examinees’ responses. Based on Clauser, Kane and Swanson (2002) validation framework, she builds