Text Simplification by Tagging (TST) Task of Text Simpliﬁcation As a Tagging Problem

Text Simplification by Tagging Kostiantyn Omelianchuk∗ Vipul Raheja∗ Oleksandr Skurzhanskyi∗ Grammarly [email protected] Abstract or reading ages (Max, 2006; Alu´ısio et al., 2008; Gasperin et al., 2009; Watanabe et al., 2009). More- Edit-based approaches have recently shown over, it has also been successfully leveraged as a promising results on multiple monolingual sequence transduction tasks. In contrast to pre-processing step to improve the performance of conventional sequence-to-sequence (Seq2Seq) various NLP tasks such as parsing (Chandrasekar models, which learn to generate text from et al., 1996), summarization (Beigman Klebanov scratch as they are trained on parallel corpora, et al., 2004; Silveira and Branco, 2012), semantic these methods have proven to be much more role labeling (Vickrey and Koller, 2008; Woodsend effective since they are able to learn to make and Lapata, 2017) and machine translation (Gerber fast and accurate transformations while lever- and Hovy, 1998; Stajnerˇ and Popovic, 2016; Hasler aging powerful pre-trained language models. et al., 2017). Inspired by these ideas, we present TST, a simple and efficient Text Simplification sys- Evolving from the approaches ranging from tem based on sequence Tagging, leveraging building hand-crafted rules (Chandrasekar et al., pre-trained Transformer-based encoders. Our 1996; Siddharthan, 2006) to syntactic and lexical system makes simplistic data augmentations simplification via synonyms and paraphrases (Sid- and tweaks in training and inference on a pre- dharthan, 2014; Kaji et al., 2002; Horn et al., 2014; existing system, which makes it less reliant on Glavasˇ and Stajnerˇ , 2015), the task has gained large amounts of parallel training data, pro- popularity as a monolingual Machine Translation vides more control over the outputs and en- ables faster inference speeds. Our best model (MT) problem, where the system learns to “trans- achieves near state-of-the-art performance on late” a given complex sentence to its simplified benchmark test datasets for the task. Since it form. Initially, Statistical phrase-based (SMT) is fully non-autoregressive, it achieves faster and Syntactic-based Machine Translation (SBMT) inference speeds by over 11 times than the cur- techniques (Zhu et al., 2010; Specia, 2010; Coster rent state-of-the-art text simplification system. and Kauchak, 2011; Wubben et al., 2012; Narayan and Gardent, 2014; Stajnerˇ et al., 2015; Xu et al., 1 Introduction 2016a) were successfully applied as a way to learn Text Simplification is the task of rewriting text into simplification rewrites implicitly from complex- a form that is easier to read and understand while simple sentence pairs, often in combination with preserving its underlying meaning and information. hand-crafted rules or features. More recently, sev- It has been shown to be valuable in providing assis- eral Neural Machine Translation-based (NMT) sys- tance in terms of readability and understandability tems have been developed with promising results to children (Belder and Moens, 2010; Kajiwara (Sutskever et al., 2014; Cho et al., 2014; Bahdanau et al., 2013), people with language disabilities like et al., 2015), and their successful application to text aphasia (Carroll et al., 1998, 1999; Devlin and Un- simplification, either in combination with SMT or thank, 2006), dyslexia (Rello et al., 2013a,b), or other data-driven approaches (Zhang et al., 2017; autism (Evans et al., 2014); non-native English Zhao et al., 2018b); or strictly as neural models speakers (Petersen and Ostendorf, 2007; Paetzold, (Wang et al., 2016; Nisioi et al., 2017; Zhang and 2015; Paetzold and Specia, 2016a,b; Pellow and Lapata, 2017; Stajnerˇ and Nisioi, 2018; Guo et al., Eskenazi, 2014), and people with low literacy skills 2018; Vu et al., 2018; Li et al., 2018; Kriz et al., 2019; Surya et al., 2019; Zhao et al., 2020a), has ∗ Authors contributed equally to this work; names are emerged as the state-of-the-art. given in alphabetical order. 11 Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 11–25 April 20, 2021 ©2021 Association for Computational Linguistics Human editors perform several rewriting trans- addition to being sample efficient, thanks to the formations in order to simplify a sentence, such as separation of various edit operations in the form lexical paraphrasing, changing the syntactic struc- of tags, the system has better interpretability and ture, or removing superfluous information from the explainability. Finally, since for sequence tagging sentence (Petersen and Ostendorf, 2007; Alu´ısio we don’t need to predict tokens one-by-one as in et al., 2008; Mallinson et al., 2020). Therefore, autoregressive decoders, the inference is naturally even though NMT-based sequence-to-sequence parallelizable and therefore runs many times faster. (Seq2Seq) approaches offer a generic framework Following from the success of the aforemen- for modeling almost any kind of sequence transduc- tioned monolingual edit-tag based systems, we pro- tion, target texts in these approaches are typically pose to leverage the current state-of-the-art model generated from scratch - a process which can be for Grammatical Error Correction by Omelianchuk unnecessary for monolingual editing tasks such as et al.(2020) (GECToR) and adapt it to the task text simplification, owing to these aforementioned of Text Simplification. In summary, we make the transformations. Moreover, these approaches have following contributions: a few shortcomings that make them inconvenient for real-world deployment. First, they give lim- • We develop a Text Simplification system by ited insight into the simplification operations and adapting the GECToR model to Text Sim- provide little control or adaptability to different as- plification, leveraging Transformer-based en- pects of simplification (e.g., lexical vs. syntactical coders trained on large amounts of human- 1 simplification). This inhibits interpretability and ex- annotated and synthetic data. Empirical re- plainability, which is crucial for real-world settings. sults demonstrate that our system achieves Second, they are not sample-efficient and require a near state-of-the-art performance on bench- large number of complex-simple aligned sentence mark test datasets in terms of readability and pairs for training, which requires considerable hu- simplification metrics. man effort to obtain. Third, these models typically • We propose crucial data augmentations and employ an autoregressive decoder, i.e., output texts tweaks in training and inference and show are generated in a sequential, non-parallel fashion, their significant impact on the task: enabling and hence, are generally characterized by slow in- the model to learn to edit the sentences more ference speeds. effectively, rather than relying heavily on copying the source sentences, leading to a Based on the aforementioned observations and higher quality of simplifications. issues, text-editing approaches have recently re- • Since our model is a non-autoregressive se- gained significant interest (Gu et al., 2019; Dong quence tagging model, it achieves over 11 et al., 2019; Awasthi et al., 2019; Malmi et al., times speedup in inference time, compared 2019; Omelianchuk et al., 2020; Mallinson et al., to the state-of-the-art for Text Simplification. 2020). Typically, the set of edit operations in such tasks is fixed and predefined ahead of time, which 2 Related Work on one hand limits the flexibility of the model to reconstruct arbitrary output texts from their inputs, Recent text editing works have shown promising but on the other, leads to higher sample-efficiency results of reformulating multiple monolingual se- as the limited set of allowed operations significantly quence transduction tasks into sequence tagging reduces the search space (Mallinson et al., 2020). tasks compared to the conventional Seq2Seq se- This pattern is especially true for monolingual set- quence generation formulation. This observation tings where input and output texts have relatively is especially true for tasks where input and out- high degrees of overlap. In such cases, a natural put sequences have a large overlap. Generally, approach is to cast the task of conditional text gen- these works try to simplify monolingual sequence eration into a text-editing task, where the model transduction by explicitly modeling edit operations learns to reconstruct target texts by applying a set such as KEEP, ADD/INSERT and DELETE. Alva- of edit operations to the inputs. We leverage this Manchego et al.(2017) proposed the first such insight in our work, and simplify the task from formulation, employing a BiLSTM to predict edit sequence generation or editing, going a step fur- 1Available at https://github.com/grammarly/ ther, to formulate it as a sequence tagging task. In gector#text-simplification 12 Text Simplification by Tagging (TST) task of Text Simplification as a tagging problem. Edit-Detection Softmax Feed-Forward Layer Specifically, our system is based on GECToR Input Encoder Edit Sentence RoBERTa-BASE Tags (Omelianchuk et al., 2020), an iterative sequence- Edit-Classification Softmax Feed-Forward Layer tagging system that works by predicting token-level edit operations, originally developed for Grammati- Post-Process cal Error Correction (GEC). We adapt the GECToR framework for the task of Text Simplification, with Repeat t times Output Sentence minimal modifications to the original architecture. Our system consists of three main parts:

Load more