Iterative Edit-Based Unsupervised Sentence Simpliﬁcation

Iterative Edit-Based Unsupervised Sentence Simplification Dhruv Kumar,1 Lili Mou,2 Lukasz Golab,1 Olga Vechtomova1 1University of Waterloo 2Department of Computing Science, University of Alberta Alberta Machine Intelligence Institute (Amii) fd35kumar,lgolab,[email protected] [email protected] Abstract (Seq2Seq)-based NMT systems are shown to be more successful and serve as the state of the art. We present a novel iterative, edit-based ap- However, supervised Seq2Seq models have two proach to unsupervised sentence simplifica- shortcomings. First, they give little insight into the tion. Our model is guided by a scoring function involving fluency, simplicity, and mean- simplification operations, and provide little con- ing preservation. Then, we iteratively per- trol or adaptability to different aspects of simplifi- form word and phrase-level edits on the com- cation (e.g., lexical vs. syntactical simplification). plex sentence. Compared with previous ap- Second, they require a large number of complex- proaches, our model does not require a paral- simple aligned sentence pairs, which in turn require lel training set, but is more controllable and considerable human effort to obtain. interpretable. Experiments on Newsela and In previous work, researchers have addressed WikiLarge datasets show that our approach some of the above issues. For example, Alva- is nearly as effective as state-of-the-art supervised approaches.1 Manchego et al.(2017) and Dong et al.(2019) explicitly model simplification operators such as 1 Introduction word insertion and deletion. Although these approaches are more controllable and interpretable Sentence simplification is the task of rewriting text than standard Seq2Seq models, they still require to make it easier to read, while preserving its main large volumes of aligned data to learn these oper- meaning and important information. Sentence sim- ations. To deal with the second issue, Surya et al. plification is relevant in various real-world and (2019) recently proposed an unsupervised neural downstream applications. For instance, it can bene- text simplification approach based on the paradigm fit people with autism (Evans et al., 2014), dyslexia of style transfer. However, their model is hard to in- (Rello et al., 2013), and low-literacy skills (Watan- terpret and control, like other neural network-based abe et al., 2009). It can also serve as a preprocess- models. Narayan and Gardent(2016) attempted ing step to improve parsers (Chandrasekar et al., to address both issues using a pipeline of lexical 1996) and summarization systems (Klebanov et al., substitution, sentence splitting, and word/phrase 2004). deletion. However, these operations can only be Recent efforts in sentence simplification have executed in a fixed order. been influenced by the success of machine transla- In this paper, we propose an iterative, edit- tion. In fact, the simplification task is often treated based unsupervised sentence simplification ap- as monolingual translation, where a complex sen- proach, motivated by the shortcomings of existing tence is translated to a simple one. Such simplifi- work. We first design a scoring function that mea- cation systems are typically trained in a supervised sures the quality of a candidate sentence based on way by either phrase-based machine translation the key characteristics of the simplification task, (PBMT, Wubben et al., 2012; Narayan and Gardent, namely, fluency, simplicity, and meaning preser- 2014; Xu et al., 2016) or neural machine translation vation. Then, we generate simplified candidate (NMT, Zhang and Lapata, 2017; Guo et al., 2018; sentences by iteratively editing the given complex Kriz et al., 2019). Recently, sequence-to-sequence sentence using three simplification operations (lex- 1Code is released at https://github.com/ ical simplification, phrase extraction, deletion and ddhruvkr/Edit-Unsup-TS reordering). Our model seeks the best simplified 7918 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7918–7928 July 5 - 10, 2020. c 2020 Association for Computational Linguistics on simplicity, fluency, and relevance. Zhao et al. (2018a) integrated the transformer architecture and paraphrasing rules to guide simplification learning. Kriz et al.(2019) produced diverse simplifications by generating and re-ranking candidates by fluency, adequacy, and simplicity. Guo et al.(2018) showed that simplification benefits from multi-task learning with paraphrase and entailment generation. Martin et al.(2019) enhanced the transformer architecture with conditioning parameters such as length, lexi- Figure 1: An example of three edit operations on a cal and syntactic complexity. given sentence. Note that dropping clauses or phrases Recently, edit-based techniques have been de- is common in text simplification datasets. veloped for text simplification. Alva-Manchego et al.(2017) trained a model to predict three simpli- candidate sentence according to the scoring func- fication operators (keep, replace, and delete) from tion. Compared with Narayan and Gardent(2016), aligned pairs. Dong et al.(2019) employed a simi- the order of our simplification operations is not lar approach but in an end-to-end trainable manner fixed and is decided by the model. with neural networks. However, these approaches Figure1 illustrates an example in which our are supervised and require large volumes of parallel model first chooses to delete a sentence fragment, training data; also, their edits are only at the word followed by reordering the remaining fragments level. By contrast, our method works at both word and replacing a word with a simpler synonym. and phrase levels in an unsupervised manner. We evaluate our approach on the Newsela (Xu For unsupervised sentence simplification, Surya et al., 2015) and WikiLarge (Zhang and Lapata, et al.(2019) adopted style-transfer techniques, us- 2017) corpora. Experiments show that our ap- ing adversarial and denoising auxiliary losses for proach outperforms previous unsupervised meth- content reduction and lexical simplification. How- ods and even performs competitively with state-of- ever, their model is based on a Seq2Seq network, the-art supervised ones, in both automatic metrics which is less interpretable and controllable. They and human evaluations. We also demonstrate the cannot perform syntactic simplification since syn- interpretability and controllability of our approach, tax typically does not change in style-transfer tasks. even without parallel training data. Narayan and Gardent(2016) built a pipeline-based unsupervised framework with lexical simplifica- 2 Related Work tion, sentence splitting, and phrase deletion. How- Early work used handcrafted rules for text simpli- ever, these operations are separate components in fication, at both the syntactic level (Siddharthan, the pipeline, and can only be executed in a fixed 2002) and the lexicon level (Carroll et al., 1999). order. Later, researchers adopted machine learning meth- Unsupervised edit-based approaches have re- ods for text simplification, modeling it as mono- cently been explored for natural language gener- lingual phrase-based machine translation (Wubben ation tasks, such as style transfer, paraphrasing, et al., 2012; Xu et al., 2016). Further, syntactic in- and sentence error correction. Li et al.(2018) formation was also considered in the PBMT frame- proposed edit-based style transfer without parallel work, for example, constituency trees (Zhu et al., supervision. They replaced style-specific phrases 2010) and dependency trees (Bingel and Søgaard, with those in the target style, which are retrieved 2016). Narayan and Gardent(2014) performed from the training corpus. Miao et al.(2019) used probabilistic sentence splitting and deletion, fol- Metropolis–Hastings sampling for constrained sen- lowed by MT-based paraphrasing. tence generation. In this paper, we model text gen- Nisioi et al.(2017) employed neural machine eration as a search algorithm, and design search translation (NMT) for text simplification, using a objective and search actions specifically for text sequence-to-sequence (Seq2Seq) model (Sutskever simplification. Concurrent work further shows the et al., 2014). Zhang and Lapata(2017) used re- success of search-based unsupervised text genera- inforcement learning to optimize a reward based tion for paraphrasing (Liu et al., 2020) and summa- 7919 rization (Schumann et al., 2020). where PLM is the sentence probability given by Q the language model, PU(s) = w2s P (w) is the 3 Model product of the unigram probability of a word w in In this section, we first provide an overview of our the sentence, and jsj is the sentence length. approach, followed by a detailed description of SLOR essentially penalizes a plain LM’s prob- each component, namely, the scoring function, the ability by unigram likelihood and the length. It edit operations, and the stopping criteria. ensures that the fluency score of a sentence is not penalized by the presence of rare words. Consider 3.1 Overview two sentences, “I went to England for vacation” We first define a scoring function as our search and “I went to Senegal for vacation.” Even though objective. It allows us to impose both hard and both sentences are equally fluent, a standard LM soft constraints, balancing the fluency, simplicity, will give a higher score to the former, since the and adequacy of candidate simplified sentences word “England” is more likely to occur than “Sene- (Section 3.2). gal.” In simplification, SLOR is preferred for pre-

Iterative Edit-Based Unsupervised Sentence Simpliﬁcation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support