Integration of a Web-Based Rating System with an Oral Proficiency Interview Test: Argument-Based Approach to Validation Hye Jin Yang Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2016 Integration of a web-based rating system with an oral proficiency interview test: argument-based approach to validation Hye Jin Yang Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Bilingual, Multilingual, and Multicultural Education Commons, English Language and Literature Commons, and the Linguistics Commons Recommended Citation Yang, Hye Jin, "Integration of a web-based rating system with an oral proficiency interview test: argument-based approach to validation" (2016). Graduate Theses and Dissertations. 15189. https://lib.dr.iastate.edu/etd/15189 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation by Hye Jin Yang A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Applied Linguistics and Technology Program of Study Committee: Carol A. Chapelle, Co-Major Professor Elena Cotos, Co-Major Professor Volker Hegelheimer Gary Ockey Frederick O. Lorenz Jo Mackiewicz Iowa State University Ames, Iowa 2016 Copyright © Hye Jin Yang, 2016. All rights reserved. ii I dedicate this dissertation to my mom and dad Ok Chun Hwang & Seung Woo Yang iii TABLE OF CONTENTS LIST OF FIGURES ------------------------------------------------------------------------------------- vi LIST OF TABLES ------------------------------------------------------------------------------------ viii ACKNOWLEDGMENTS ------------------------------------------------------------------------------ xi ABSTRACT ------------------------------------------------------------------------------------------ xixiii CHAPTER 1 INTRODUCTION ---------------------------------------------------------------------- 1 1.1 Background ---------------------------------------------------------------------------------------- 1 1.2 Problems in the Context of This Dissertation ------------------------------------------------- 6 1.3 Purpose of This Dissertation -------------------------------------------------------------------- 7 1.4 Significance of The Study ----------------------------------------------------------------------- 9 1.5 Dissertation Overview -------------------------------------------------------------------------- 10 CHAPTER 2 THEORETICAL AND EMPIRICAL FOUNDATION --------------------------- 12 2.1 Issues in Computer-Assisted Language Testing --------------------------------------------- 12 2.1.1 History and key attributes of CALT ----------------------------------------------------- 12 2.1.2 CALT types --------------------------------------------------------------------------------- 15 2.1.3 Construct validation studies in CALT --------------------------------------------------- 20 2.1.3.1 Correlation studies -------------------------------------------------------------------- 21 2.1.3.2 Comparison of examinees’ performance between CBT and PBT -------------- 22 2.1.3.3 Sources of construct-irrelevant variance relevant to computers ----------------- 23 2.1.4 Gaps in research ---------------------------------------------------------------------------- 26 2.2 Assessing Speaking Ability in Language Performance Tests ----------------------------- 27 2.2.1 Conceptualization of the speaking performance test process ------------------------- 28 2.2.2 Rater variability ----------------------------------------------------------------------------- 31 2.2.2.1 Rater severity -------------------------------------------------------------------------- 32 2.2.2.2 Rating scale use ------------------------------------------------------------------------ 35 2.2.3 Task variability ----------------------------------------------------------------------------- 36 2.2.3.1 Task difficulty factors ----------------------------------------------------------------- 37 2.2.3.2 Empirical methods of estimating task difficulty ---------------------------------- 39 2.2.4 Gaps in research ---------------------------------------------------------------------------- 41 CHAPTER 3 CONTEXT AND ARGUMENT-BASED APPROACH TO VALIDATION - 42 3.1 Context -------------------------------------------------------------------------------------------- 42 3.1.1 Components of the OECT ----------------------------------------------------------------- 43 3.1.2 Rating procedure --------------------------------------------------------------------------- 44 3.2 Rater-Platform (R-Plat) ------------------------------------------------------------------------- 48 3.3 Interpretive Argument for the OPI scores ---------------------------------------------------- 57 3.4 Research Questions ----------------------------------------------------------------------------- 66 iv CHAPTER 4 METHODOLOGY -------------------------------------------------------------------- 68 4.1 Research Design --------------------------------------------------------------------------------- 68 4.2 Participants --------------------------------------------------------------------------------------- 69 4.3 Materials ------------------------------------------------------------------------------------------ 71 4.3.1 Questionnaire--------------------------------------------------------------------------------71 4.3.2 Focus group and individual interviews--------------------------------------------------72 4.3.3 OPI prompts---------------------------------------------------------------------------------72 4.3.4 Scoring rubric ----------------------------------------------------------------------------- 74 4.3.5 Diagnostic descriptors --------------------------------------------------------------------- 75 4.4 Procedure ----------------------------------------------------------------------------------------- 76 4.4.1 Questionnaire ------------------------------------------------------------------------------- 77 4.4.2 Focus group and individual interviews -------------------------------------------------- 78 4.4.3 OPI prompt rotation ------------------------------------------------------------------------ 78 4.4.4 Rating results -------------------------------------------------------------------------------- 81 4.5 Data Analysis ------------------------------------------------------------------------------------ 84 4.5.1 Raters’ perceptions towards R-Plat (RQ1) ---------------------------------------------- 85 4.5.2 Comparisons of diagnostic descriptor markings (RQ2) ------------------------------- 87 4.5.3 Raters’ comments as indicators of speaking ability levels (RQ3) ------------------- 88 4.5.4 Descriptive statistics (RQ4) --------------------------------------------------------------- 93 4.5.5 OPI ratings as reliable indicators of different speaking ability levels (RQ5) ------- 94 4.5.6 Consistency of OPI prompts at different difficulty levels (RQ6) -------------------- 97 4.5.7 Consistency of raters within each administration (RQ7) ------------------------------ 99 CHAPTER 5 RESULTS ----------------------------------------------------------------------------- 101 5.1 Raters’ Perceptions towards R-Plat --------------------------------------------------------- 101 5.1.1 Clarity of R-Plat -------------------------------------------------------------------------- 102 5.1.2 Level of raters’ comfort with R-Plat --------------------------------------------------- 107 5.1.3 Effectiveness of R-Plat and raters’ satisfaction -------------------------------------- 112 5.2 Use of Diagnostic Descriptors to Support Proficiency Level Ratings ------------------ 116 5.2.1 Proficiency level comparisons of diagnostic descriptor markings ----------------- 116 5.2.2 Seven categories of diagnostic descriptors -------------------------------------------- 119 5.2.3 Raters’ reasons for selecting diagnostic descriptors --------------------------------- 135 5.3 Use of Raters’ Comments to Support Proficiency Level Ratings ----------------------- 137 5.3.1 Inter-coder reliability -------------------------------------------------------------------- 138 5.3.2 Comparison of positive and negative comments across proficiency levels ------- 138 5.3.3 Comparison of positive and negative evaluative units grouped by the OPI scoring criteria across proficiency levels ---------------------------------------------- 141 5.4. Descriptive Statistics ------------------------------------------------------------------------- 162 5.5 Dependability of OPI Ratings --------------------------------------------------------------- 165 5.5.1 Descriptive statistics for OPI ratings --------------------------------------------------- 165 5.5.2 Unidimensionality assumption check -------------------------------------------------- 168 5.5.3 Dependability of the OPI ratings ------------------------------------------------------- 172 5.6 Comparison of Intended Prompt Level and Observed Difficulty ----------------------- 172 5.6.1 Prompt difficulty at the advanced level ----------------------------------------------- 173 v 5.6.2 Prompt difficulty at the intermediate-high level ------------------------------------- 176 5.6.3 Prompt difficulty at the intermediate-mid