A Memory-Augmented Neural Model for Automated Grading

A Memory-Augmented Neural Model for Automated Grading Leave Authors Anonymous Leave Authors Anonymous Leave Authors Anonymous for Submission for Submission for Submission City, Country City, Country City, Country e-mail address e-mail address e-mail address ABSTRACT scoring (AES) systems has been used in these tests to reduce The need for automated grading tools for essay writing and the time and cost of grading essays. Moreover, as massive open-ended assignments has received increasing attention open online courses (MOOCs) become widespread and the due to the unprecedented scale of Massive Online Courses number of students enrolled in one course increases, the need (MOOCs) and the fact that more and more students are relying for grading and providing feedback on written assignments on computers to complete and submit their school work. In are ever critical. this paper, we propose an efficient memory networks-powered As part of the automated grading system, AES has employed automated grading model . The idea of our model stems from numerous efforts to improving its performance. AES uses the philosophy that with enough graded samples for each score statistical and Natural Language Processing (NLP) techniques in the rubric, such samples can be used to grade future work to automatically predict a score for an essay based on the essay that is found to be similar. For each possible score in the rubric, prompt and rubric. Essay writing is usually a common student a student response graded with the same score is collected. assessment process in schools and universities. In this task, These selected responses represent the grading criteria spec- students are required to write essays of various length, given a ified in the rubric and are stored in the memory component. prompt or essay topic. Our model learns to predict a score for an ungraded response by computing the relevance between the ungraded response Most existing AES systems are built on the basis of predefined and each selected response in memory. The evaluation was features, e.g. number of words, average word length, and conducted on the Kaggle Automated Student Assessment Prize number of spelling errors, and a machine learning algorithm (ASAP) dataset. The results show that our model achieves [4]. It is normally a heavy burden to find out effective features state-of-the-art performance in 7 out of 8 essay sets and can for AES. Moreover, the performance of the AES systems is be trained efficiently due to the simplicity of model structure. constrained by the effectiveness of the predefined features. Recently another kind of approach has emerged, employing ACM Classification Keywords neural network models to learn the features automatically in I.2.7. ARTIFICIAL INTELLIGENCE: Natural Language Pro- an end-to-end manner [29]. By this means, a direct predic- cessing tion of essay scores can be achieved without performing any feature extraction. The model based on long short-term mem- Author Keywords ory (LSTM) networks in [29] has demonstrated promise in Automated grading; neural networks; memory networks; accomplishing multiple types of automated grading tasks. word embeddings; natural language processing Neural Networks have achieved promising results on various NLP tasks, including machine translation [1,5], sentiment INTRODUCTION analysis [6], and question answering [13, 31, 18, 28]. Neural Automated grading is a critical part of Massive Open Online Network models, in terms of NLP tasks, use word vectors to Courses (MOOCs) system and any intelligent tutoring systems learn distributed representations from text. The advantages are (ITS) at scale. Many studies have been conducted to improve that these models do not require hand-engineered features and automated grading for assignments with simple fixed-form an- can be trained to solve tasks in an end-to-end fashion. swers, short-answers [3, 15, 19, 26, 21], or long-form answers [26,2,7, 14]. Some standard tests, such as Test of English as Recent work [29] has exploited several Recurrent Neural Net- a Foreign Language (TOEFL) and Graduate Record Examina- work (RNN) models to solve AES tasks. The results show that tion (GRE), assess student writing skills. Manually grading neural-based models outperform even strong baselines. Mem- these essay will be time-consuming. Thus automated essay ory Networks (MN) [31, 18, 28] have been recently introduced to deal with complex reasoning and inferencing NLP tasks Permission to make digital or hard copies of all or part of this work for personal or and have been shown to outperform RNNs on some complex classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation reasoning tasks [28]. MN is a class of models which contains on the first page. Copyrights for components of this work owned by others than the an external scalable memory and a controller to read from author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and write to that memory. The notion of neural networks with and/or a fee. Request permissions from [email protected]. memory was introduced to solve complex reasoning and in- CHI’16, May 07–12, 2016, San Jose, CA, USA ferring AI-tasks which require remembering external contexts. © 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. Some work [18, 28] has shown the success of MN on different ISBN 123-4567-24-567/08/06. $15.00 DOI: http://dx.doi.org/10.475/123_4 1 kinds of tasks, e.g. bAbI tasks [30], MovieQA, and WikiQA prediction directly. One research direction of this approach [18]. is to apply information extraction techniques to constructing specific answer patterns manually or to training from large To our knowledge, no study has been conducted to investigate training dataset with strong supervision support [2,3,24]. An- the feasibility and effectiveness of MN applied in automated other direction is to compare the students’ answers with a es- grading tasks. In this study, we develop a generic model for tablished standard answer with an unsupervised text-similarity such tasks using Memory Networks inspired by their capabil- approach [21]. ity to store rich representations of data and reason over that data in memory. For each essay score, we select one essay Most studies mentioned above are dealing with simple fixed- exhibiting the same score from student responses as a sample form answers or short-answers assignments. Some complex for that grade. All collected sample responses are loaded into assignments have long form answer instead of short, simple the memory of the model. The model is trained with the rest one. Essay writing with a given topic is a typical assignment of student responses in a supervised learning manner on these with long form answers and AES has become one important data to compute the relevance between the representation of research branch of automated grading system. an ungraded response and that of each sample. The intuition is that as a part of a scoring rubric, a number of sample re- AES is generally treated as a machine learning problem. We sponses of variable quality are usually provided to students can group the existing AES solutions from different points of view. Most developed AES system is based on a number and graders to help them better understand the rubric. These of predefined features. These features include essay length, collected responses are characterized with expectations of number of words, lexicon and grammar, syntactic features, quality described in the rubric. The model is expected to learn readability, text coherence, essay organization, and so on [4]. the grading criteria from these responses. We evaluate our model on a publicly available essay grading data set from the Recently, there emerges another trial to treat the whole essays Kaggle Automated Student Assessment Prize (ASAP) compe- as inputs and learn the features automatically in an end-to-end manner [29]. Without pres-working on features extraction, tition (https://www.kaggle.com/c/asap-aes). Our experiments show that our model achieves state-of-the-art results on this work burden was lightened. Moreover, the predicting accurate dataset and training of the model is found to be efficient and is improved by removing the dependency of effectiveness of cost-effective. predefined features. The rest of the paper is organized as follows. Section 2 gives Based on learning techniques utilized in existing solutions, we an overview of related work in this research area. Section divide them into three categories: regression based approach, 3 provides detailed information of our model. Section 4 de- classification based approach and preference ranking based ap- scribes the ASAP dataset and evaluation metrics used to test proach. PEG-system and E-rater are two examples that belong our framework. Furthermore, it contains the details of our im- to regression based approach. Specifically, when the scores plementation and experimental setup to help other researchers range of the essays is wide, the regression based approach is replicate our work. In section 5, we present the results of our normally adopted since it treats the essay score as a continuous value. framework and compare them with other models. Finally, we discuss the results and conclude the paper. Besides essay writing, some complex assignments such as medicinal assignments utilized regression model as well [9]. RELATED WORK Some work such as [27, 14] treated the AES task as a classification problem. Each possible score is converted into a class Automated Grading label. By using classic classification algorithms, AES system MOOCs were introduced in 2008 and become more popular predicts which class an essay should belongs to.

Load more