An Item Response Curves Analysis of the Force Concept Inventory Gary A
Total Page:16
File Type:pdf, Size:1020Kb
Kennesaw State University DigitalCommons@Kennesaw State University Faculty Publications 9-2012 An Item Response Curves Analysis of the Force Concept Inventory Gary A. Morris Valparaiso University Nathan Harshman American University Lee Branum-Martin University of Houston Eric Mazur Harvard University Taha Mzoughi Kennesaw State University, [email protected] See next page for additional authors Follow this and additional works at: http://digitalcommons.kennesaw.edu/facpubs Part of the Physics Commons, and the Science and Mathematics Education Commons Recommended Citation Morris, G. A., Harshman, N., Branum-Martin, L., Mazur, E., Mzoughi, T., & Baker, S. D. (2012). An item response curves analysis of the Force Concept Inventory. American Journal Of Physics, 80(9), 825-831. This Article is brought to you for free and open access by DigitalCommons@Kennesaw State University. It has been accepted for inclusion in Faculty Publications by an authorized administrator of DigitalCommons@Kennesaw State University. For more information, please contact [email protected]. Authors Gary A. Morris, Nathan Harshman, Lee Branum-Martin, Eric Mazur, Taha Mzoughi, and Stephen D. Baker This article is available at DigitalCommons@Kennesaw State University: http://digitalcommons.kennesaw.edu/facpubs/2776 An item response curves analysis of the Force Concept Inventory Gary A. Morris, Nathan Harshman, Lee Branum-Martin, Eric Mazur, Taha Mzoughi, and Stephen D. Baker Citation: American Journal of Physics 80, 825 (2012); doi: 10.1119/1.4731618 View online: http://dx.doi.org/10.1119/1.4731618 View Table of Contents: http://scitation.aip.org/content/aapt/journal/ajp/80/9?ver=pdfcov Published by the American Association of Physics Teachers Articles you may be interested in Erratum: “An item response curves analysis of the force concept inventory” [Am. J. Phys. 80, 825–831 (2012)] Am. J. Phys. 81, 144 (2013); 10.1119/1.4766939 Comparing large lecture mechanics curricula using the Force Concept Inventory: A five thousand student study Am. J. Phys. 80, 638 (2012); 10.1119/1.3703517 Analyzing force concept inventory with item response theory Am. J. Phys. 78, 1064 (2010); 10.1119/1.3443565 Testing the test: Item response curves and test quality Am. J. Phys. 74, 449 (2006); 10.1119/1.2174053 The effect of distracters on student performance on the force concept inventory Am. J. Phys. 72, 116 (2004); 10.1119/1.1629091 This article is copyrighted as indicated in the article. Reuse of AAPT content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.218.13.44 On: Tue, 29 Nov 2016 16:13:11 An item response curves analysis of the Force Concept Inventory Gary A. Morris Department of Physics and Astronomy, Valparaiso University, Valparaiso, Indiana 46383 Nathan Harshman Department of Physics, American University, Washington, DC 20016 Lee Branum-Martin Texas Institute for Measurement, Evaluation, and Statistics, University of Houston, Houston, Texas 77204 Eric Mazur Department of Physics, Harvard University, Cambridge, Massachusetts 02138 Taha Mzoughi Department of Biology and Physics, Kennesaw State University, Kennesaw, Georgia 30144 Stephen D. Baker Department of Physics and Astronomy, Rice University, Houston, Texas 77005 (Received 30 November 2011; accepted 13 June 2012) Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors’ original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels. VC 2012 American Association of Physics Teachers. [http://dx.doi.org/10.1119/1.4731618] I. INTRODUCTION total score on the test. Weighing each question equally in determining ability is a common model, but other methods, One of the joys of physics is that, unlike many other fields such as factor analysis or IRT, use correlations in the test of inquiry, right and wrong answers are often unambiguous. response data to weigh the questions differently. Many Because of this apparent objectivity, multiple-choice tests standardized tests subtract a fraction of a point from the raw remain an essential tool for assessment of physics teaching score to “penalize” guessing. This model assumes that the and learning. In particular, the Force Concept Inventory probability of guessing the right answer to a question does (FCI) (Ref. 1) is widely used in physics education research not depend on the specific question, that guessing any partic- (PER). ular answer choice on a particular question is equally likely, Tests are designed with specific models in mind. The FCI and that guessing works the same way at all ability levels. was designed so that the raw score measures (in some sense) Other, more sophisticated approaches, such as those in factor the ability of “Newtonian thinking.” This article compares analysis or item response theory,3 statistically define the two methods of test analysis based on different models of the probability of response in order to estimate potentially differ- functionality of the FCI. These methods have the unfortu- ent weights for items. These approaches can be viewed as nately similar names of item response curves (IRC) (Ref. 2) drawing information from the correlations among items to and item response theory (IRT) (e.g., Ref. 3). By comparing estimate appropriate weights. By the very nature of a these methods, we will show that the model behind the IRC multiple-choice test, all models presume that there exists an analysis is more consistent with that envisioned by the FCI unambiguously correct answer choice for every item. But, a designers. Additionally, it is easier to use and its results are key difference in models is associated with how they handle easier to interpret in a meaningful way. the distractors. Any method of test analysis requires the construction of a IRC uses total score as a proxy for ability level, which model to quantify the properties of the complex, interacting implies that all items are equally weighed. In order to imple- “population þ test” system. For example, in order to detect ment IRC analysis, item-level data are necessary. IRC analy- correlations in the data, models often assume that each test- sis describes each item with trace lines for the proportion of taker has a true but unknown “ability.” In assessing students, examinees who select each answer choice, including the dis- educators expect ability to be strongly correlated with raw tractors, at each ability level (see Fig. 1). To make an IRC 825 Am. J. Phys. 80 (9), September 2012 http://aapt.org/ajp VC 2012 American Association of Physics Teachers 825 This article is copyrighted as indicated in the article. Reuse of AAPT content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.218.13.44 On: Tue, 29 Nov 2016 16:13:11 Fig. 1. The IRCs for all 30 questions on the FCI using the approach of Morris et al. (Ref. 2). This figure is analogous to Fig. 4 in Wang and Bao (Ref. 4). Each graph shows the percentage of respondents (vertical axis) at each ability level (horizontal axis) on a given item selecting a given answer choice: þ¼1; ? ¼ 2; ^ ¼ 3; 4¼4; h¼ 5. graph for a single item, we must separate the respondents by designed to function differently. The 3PL IRT analysis ability level and then determine the fraction at each ability assumes that all wrong answers for a given item are equiva- level selecting each answer choice. For example, in our data lent. We will provide IRC analysis examples that show, in an set, there were 116 students of ability level 20 who answered easy-to-interpret graphical form, how different distractors Question 8. Of those, 81 selected Answer Choice 2, the cor- appeal to students with varying abilities. If distractors are not rect answer. Thus, the trace line for the IRC of the correct all equally “wrong,” we could estimate ability in a more sen- answer (asterisks in Fig. 1) passes through the point (20, sitive manner, effectively giving partial credit for some 69%). When repeated for all ability levels, all questions, and wrong answers—those that suggest a deeper understanding— all answer choices, we arrive at the final product of IRC anal- than other wrong answers (cf. Bock7). Furthermore, we may ysis: a set of traces for each item, such as those for the FCI assess the effectiveness of items and alternative answer appearing in Fig. 1. choices as to how well they function in measuring student IRT analysis derives student ability by a statistical analy- understanding and misconceptions.2 sis of item responses and does not necessarily weigh all The IRC analysis is particularly meaningful in the case of questions equally. In fact, IRT uses a probabilistic model the FCI because IRT analyses demonstrate the strong corre- that is a convolution of examinee response patterns with a lation between ability and total score for this test.3 This qual- specific model of item functionality.