Augmenting Textual Qualitative Features in Deep Convolution Recurrent Neural Network for Automatic Essay Scoring

Augmenting Textual Qualitative Features in Deep Convolution Recurrent Neural Network for Automatic Essay Scoring Tirthankar Dasgupta, Abir Naskar, Rupsa Saha and Lipika Dey TCS Innovation Lab, India (dasgupta.tirthankar, abir.naskar, rupsa.s, lipika.dey)@tcs.com Abstract Manual evaluation processes by multiple evalua- tors may also be prone to erroneous judgments In this paper we present a qualitatively en- due to mutual disagreements between the eval- hanced deep convolution recurrent neural uators. Therefore, developing a means through network for computing the quality of a text which such essays can be automatically scored, in an automatic essay scoring task. The with minimum human interference, seem to be the novelty of the work lies in the fact that best way forward to meet the growing demands of instead of considering only the word and the education world, while keeping inter-evaluator sentence representation of a text, we try to disagreements to a minimum. Automatic Essay augment the different complex linguistic, Scoring (AES) systems have thus been in the re- cognitive and psychological features asso- search focus of multiple organizations to counter ciated within a text document along with the above issues (Landauer, 2003). a hierarchical convolution recurrent neu- A typical AES system takes as input an essay ral network framework. Our preliminary written on a specific topic. The system then as- investigation shows that incorporation of signs a numeric score to the essay reflecting its such qualitative feature vectors along with quality, based on its content, grammar, organiza- standard word/sentence embeddings can tion and other factors discussed above. give us better understanding about im- A plethora of research have been done to proving the overall evaluation of the input develop AES systems on various languages essays. (Taghipour and Ng, 2016; Dong et al., 2017; Alikaniotis et al., 2016; Attali and Burstein, 2004; 1 Introduction Chen and He, 2013; Chen et al., 2010; Cummins The quality of text depends upon a number of lin- et al., 2016). Most of these tools are based on re- guistic factors, corresponding to different textual gression methods applied to a set of carefully de- properties, such as grammar, vocabulary, style, signed complex linguistic and cognitive features. topic relevance, clarity, comprehensibility, infor- Knowledge of such complex features have been mativeness, lexical diversity, discourse coherence, shown to achieve performance that is indistin- and cohesion (Crossley et al., 2008)(McNamara guishable from that of human examiners. How- et al., 2002). In addition, there are deep cogni- ever, since it is difficult to exhaustively enumerate tive and psychological features, such as types of all the multiple factors that influence the quality syntactic constructions, grammatical relations and of texts, the challenge of automatically assigning measures of sentence complexity, that make auto- a satisfactory score to an essay still remains. matic analysis of text quality a non-trivial task. Recent advancement in deep learning tech- Developing tools for automatic text quality niques have influenced researchers to apply them analysis have become extremely important to for AES tasks. The deep multi-layer neural net- organizations that need to assess writing skills works can automatically learn useful features from among adults and students on a regular basis. Be- data, with lower layers learning basic feature de- cause of the high participation in such assess- tectors and upper levels learning more high-level ments, the amount of time and effort required to abstract features. Deep neural network models, grade the large volume of textual data generated however, do not allow us to identify and extract is too high to be feasible by a human evaluator. those properties of text that the network identi- 93 Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pages 93–102 Melbourne, Australia, July 19, 2018. c 2018 Association for Computational Linguistics fies as discriminative (Alikaniotis et al., 2016). In 2 Related Works particular, deep network models fail to take into account integral linguistic and cognitive factors A plethora of attempts have been taken to develop present in text, which play an important role in AES systems over the years. A detailed overview an essay score assigned by experts. Such models of the early works on AES is reported in (Valenti emphasizes a simple uniform paradigm for NLP: et al., 2003). An Intelligent Essay Assessor (Foltz ”language is just sequences of words”. While this et al., 1999) was proposed more recently that uses approach has rapidly found enormous popular- Latent Semantic Analysis to compute the seman- ity and success, its limitations are now becoming tic similarity between texts. Lonsdale and Strong- more apparent. Gradually researchers stressing to- Krause (Lonsdale and Strong-Krause, 2003) used wards the importance of linguistic structure and the Link Grammar parser (Sleator and Temperley, the fact that it reduces the search space of possible 1995) to score texts based on average sentence- outputs, making it easier to generate well-formed level scores calculated from the parser’s cost vec- output (Lapata, 2017). Dyer (Dyer, 2017) also ar- tor. In Rudner and Liang’s Bayesian Essay Test gued for the importance of incorporating linguistic Scoring System (Rudner and Liang, 2002), stylis- structure into deep learning. He drew attention to tic features in a text are classified using a Naive the inductive biases inherent in the sequential ap- Bayes classifier. Attali and Burstein’s e-Rater (At- proach, arguing that RNNs have an inductive bias tali and Burstein, 2004), includes aspects of gram- towards sequential recency, while syntax-guided mar, vocabulary and style among other linguistic hierarchical architectures have an inductive bias features, whose weights are fitted by regression. towards syntactic recency. Several papers noted A weakly supervised bag-of-word approach was the apparent inability of RNNs to capture long- proposed by Chen et al. (Chen et al., 2010). A range dependencies, and obtained improvements discriminative learning based approach was pro- using recursive models instead (Chen et al., 2017). posed by Yannakoudakis et al. (Yannakoudakis and Cummins, 2015) that extracts deep linguistic features and employs a discriminative learning- In order to overcome the aforementioned is- to-rank model that out-performs regression. Re- sues, in this paper we propose a qualitatively en- cently, Farra et al. (Farra et al., 2015) utilized hanced deep convolution recurrent neural network variants of logistic and linear regression and de- architecture for automatic scoring of essays. Our veloped scoring models. McNamara et al.’s hier- model takes into account both the word-level and archical classification approach (McNamara et al., sentence-level representations, as well as linguis- 2015) uses linguistic, semantic and rhetorical fea- tic and psychological feature embeddings. To the tures. Despite the existing body of work, at- best of our knowledge, no other prior work in this tempts to incorporate more diverse features to field has investigated the effectiveness of combin- text scoring models are ongoing. (Klebanov and ing word and sentence embeddings with linguistic Flor, 2013) demonstrated improved performance features for AES tasks. Our preliminary investiga- by adding information about levels of association tion shows that incorporation of linguistic feature among word pairs in a given text. (Somasundaran vectors along with standard word/sentence embed- et al., 2014) used the interaction of lexical chains dings do improve the overall scoring of the input with discourse elements for evaluating the qual- essays. ity of essays. Crossley et al. (Crossley et al., 2015) identified student attributes, such as stan- The rest of the paper is organized as follows: dardized test scores, and used them in conjunc- Section 2 describes the recent state of art in tion with textual features to develop essay scoring AES systems. Our proposed Linguistically in- models. Readability features (Zesch et al., 2015) formed Convolution LSTM model architecture is and text coherence have also been proposed as a discussed in Section 3, while section 4 has fur- source of information to assess the flow of infor- ther details on generation of linguistic feature vec- mation and argumentation of an essay (Chen and tors. In section 5, we cover the experimentation He, 2013). A detailed overview of the features and evaluation technique, reporting the obtained used in AES systems can be found in (Zesch et al., results in section 6, and finally concluding the pa- 2015). Some attempts have been made to address per in section 7. different aspects of essay writing, like argument 94 strength and organization, independently, through chical convolution network connected with a bidi- designing task-specific features for each aspect rectional long-short term memory (LSTM) model (Persing et al., 2010; Persing and Ng, 2015). There (Hochreiter and Schmidhuber, 1997)(Schmidhu- has been a lot of recent work in deep neural net- ber et al., 2006). We will begin the model archi- work models based on continuous-space represen- tecture by first explaining about generating the lin- tation of the input and non-linear functions. Re- guistic and psychological feature embeddings that cently, deep learning techniques have been ap- will in turn be used by the neural network archi-

Load more