Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)

Text Simplification Using Neural

Tong Wang and Ping Chen and John Rochford and Jipeng Qiang Department of Computer Science, University of Massachusetts Boston, [email protected] Department of Computer Engineering, University of Massachusetts Boston Eunice Kennedy Shriver Center, University of Massachusetts Medical School Department of Computer Science, Hefei University of Technology

Abstract tence correct. However, the LS system is not able to simplify a complex syntactic structure. Text simplification (TS) is the technique of reducing the lexi- Rule-based systems use handcrafted rules for syntactic cal, syntactical complexity of text. Existing automatic TS sys- simplification, and substitute difficult words using prede- tems can simplify text only by lexical simplification or by manually defined rules. Neural Machine Translation (NMT) fined vocabulary (Siddharthan and Mandya 2014). Through is a recently proposed approach for Machine Translation analyzing syntactic structure, a sentence with a particular (MT) that is receiving a lot of research interest. In this paper, structure can be transformed into a simple structure. For we regard original English and simplified English as two lan- example, if a long sentence contains “not only” and “but guages, and apply a NMT model–Recurrent Neural Network also”, it could be split into two sentences. The disadvantages (RNN) encoder-decoder on TS to make the neural network to are that this kind of simplification system needs significant learn text simplification rules by itself. Then we discuss chal- human-involvement to manually define rules; and that it is lenges and strategies about how to apply a NMT model to the hopeless to write an exhaustive set of rules. task of text simplification. Machine Translation (MT) approaches for TS are show- ing very good performance. Original English and simplified Introduction English can be thought of as two different languages. TS would be the process to translate English to simplified En- Text simplification (TS) aims to simplify the lexical, gram- glish (Some call it Monolingual Machine Translation) (Zhu, matical, or structural complexity of text while retaining its Bernhard, and Gurevych 2010). The English Wikipedia and semantic meaning. It can help various groups of people, in- the Simple English Wikipedia can be used to create a paral- cluding children, non-native speakers, the functionally illit- lel corpus of aligned sentences to train the system. erate, and people with cognitive disabilities, to understand Neural Machine Translation (NMT) is a newly proposed text better. Moreover, TS relates to many Natural Language MT approach. It achieves very impressive results on MT Processing tasks, such as , machine translation, and tasks (Cho et al. 2014) (Sutskever, Vinyals, and Le 2014). summarization (Chandrasekar, Doran, and Srinivas 1996). Instead of operating on small components separately, as with Many simplification approaches are used to create simpli- a traditional MT system, the NMT system attempts to build fied text, including substituting, adding, or removing words; a single large neural network such that every component is and shortening, splitting, dropping, or merging sentences. tuned based upon training sentence pairs. In this paper, we Different simplification approaches are applied based upon propose to apply the NMT model to the TS task. To the best context, length, and syntactic structure of source words and of our knowledge, it is the first work to apply Neural Ma- sentences (Petersen and Ostendorf 2007). Usually, multi- chine Translation to text simplification. Contrary to previous ple simplification approaches work together to simplify text. approaches, NMT systems do not rely upon similarity mea- Research into automatic text simplification has been ongo- sures or heavily handcrafted rules. The deep neural network ing for decades. It is generally divided into three systems: can learn all simplification rules by itself. lexical simplification, rule-based, and machine translation. A lexical simplification (LS) system simplifies text NMT Model on Text Simplification mainly by substituting infrequently-used and difficult words with frequently-used and easier words. The general process This section introduces the RNN encoder-decoder NMT for lexical simplification includes: identification of difficult model; explains differences between TS and MT tasks; words; finding synonyms or similar words by various simi- and discusses challenges and strategies for applying RNN encoder-decoder on text simplification. larity measures (Glavasˇ and Stajnerˇ 2015); ranking and se- lecting the best candidate word based on criteria such as lan- RNN Encoder-Decoder Model guage model; and keeping the grammar and syntax of a sen- We define Vs and Vt as the vocabulary of the source lan- ∗ ∗ Copyright c 2016, Association for the Advancement of Artificial guage and the target language; and Vs and Vt as all se- Intelligence (www.aaai.org). All rights reserved. quences (sentences) over Vs and Vt respectively. Given a

4270 Machine Translation Text Simplfication directly. However, there are differences between the TS task Most frequent words (e.g., Infrequent words cannot be and the MT task, which we list in Table 1. Therefore, a sin- top 15,000) are used for simply ignored in the TS gle NMT model is not able to handle different text simplifi- source and target languages. task. It is important to sim- cation operations. An option is to design a particular neural Every out-of-vocabulary plify them. network or other type of classifier to detect which opera- word is replaced with UNK tion(e.g., splitting, merging) should be applied first. Then, token. we could design a different RNN encoder-decoder model to |Vs|≈|Vt| and Vs ∩ Vt = ∅ |Vt||Vs| and Vt ⊂ Vs Nearly no sharing words in Some or even all words in simplify a specific simplification operation. For example, the source sentence X and tar- Y may remain the same af- update gate and the forget gate of hidden units should mem- get sentence Y . ter simplifying X. orize the long dependency for the merging operation of a The relation between source The relation could be one to long input sentence. The hidden units should focus on the sentence and target sentence many (splitting) or many to short dependency for the splitting operation. is usually one to one. one (merging). Future Work Table 1: Differences Between MT and TS task This work is at an early stage. Due to the lack of available aligned sentence pairs, the first step is to collect training ∗ source sentence X =(x1,x2, ..., xl), where X ∈ Vs , data. We plan to do so in two ways: crowdsourcing, or auto- xi ∈ Vs, and xi is the ith word in X, the target (simplified) matic discovery of aligned sentences from Simple English sentence is Y =(y1,y2, ..., yl ). l and l are the length of the Wikipedia and English Wikipedia. We will then train the sentences. The length is not fixed. Our goal is to build a neu- RNN encoder-decoder to score candidate words on a lexi- ral network to model the conditional probability p(Y |X), cal simplification system. Our final goal is to build the RNN then train the model to maximize the probability. encoder-decoder model to simplify any English sentence. RNN is a class of Neural Network in which the connec- tions between internal units may form a directed cycle to References exhibit the entire history of previous inputs. The structure Chandrasekar, R.; Doran, C.; and Srinivas, B. 1996. Moti- of RNN makes it an ideal model to process sequence in- vations and methods for text simplification. In Proceedings puts such as sentences with arbitrary lengths. RNN Encoder- of the 16th conference on Computational linguistics-Volume Decoder (Cho et al. 2014) (Sutskever, Vinyals, and Le 2014) 2, 1041–1044. Association for Computational Linguistics. is a NMT model that can be jointly trained to maximize the Cho, K.; Van Merrienboer,¨ B.; Gulcehre, C.; Bahdanau, D.; conditional probability of a target sequence given a source Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning p Y |X sequence ( ). It consists of two RNNs. One encodes phrase representations using rnn encoder-decoder for statis- the source sentence into a fixed length vector. The other de- tical machine translation. arXiv preprint arXiv:1406.1078. codes the vector into a target sentence. We intend to apply ˇ the RNN Encoder-Decoder model to the TS task. Glavas,ˇ G., and Stajner, S. 2015. Simplifying lexical sim- To make RNN capture short-term and long-term depen- plification: Do we need simplified corpora? In Proceedings dencies of a sentence, the type of hidden units should of the 53th Annual Meeting of the Associatioin for Compu- be carefully selected. The hidden units can be Long tational Linguistics (ACL-2015, Short Papers), 63. Short-Term Memory (LSTM) (Hochreiter and Schmidhu- Hochreiter, S., and Schmidhuber, J. 1997. Long short-term ber 1997) (Sutskever, Vinyals, and Le 2014) or other gating memory. Neural computation 9(8):1735–1780. units (Cho et al. 2014). The gating hidden units are crucial Petersen, S. E., and Ostendorf, M. 2007. Text simplification for RNN to “understand” a sentence, since they are able to for language learners: a corpus analysis. In SLaTE, 69–72. carry out information from the previous hidden state, and Citeseer. drop irrelevant information. Siddharthan, A., and Mandya, A. 2014. Hybrid text simplifi- cation using synchronous dependency grammars with hand- Challenges and Strategies for Text Simplification written and automatically harvested rules. In Proceedings of using RNN Encoder-Decoder the 14th Conference of the European Chapter of the Associ- We may apply the RNN encoder-decoder model in two ways ation for Computational Linguistics, 722–731. for text simplification. We will use this model as part of Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence a LS system as an additional feature by scoring the candi- to sequence learning with neural networks. In Advances in date words of a source word. A LS system usually uses an neural information processing systems, 3104–3112. N-gram language model and information contents to score Zhu, Z.; Bernhard, D.; and Gurevych, I. 2010. A mono- candidate words. Theoretically, the NMT model is able to lingual tree-based translation model for sentence simplifica- strengthen the LS system to capture linguistic regularities tion. In Proceedings of the 23rd international conference of whole sentences. Once the training of the RNN encoder- on computational linguistics, 1353–1361. Association for decoder is complete, we can produce the simplified sentence Computational Linguistics. by Yˆ =argmaxY p(Y |X). The candidate words will be se- lected accordingly to maximize the conditional probability. We can use this model also to simplify English sentences

4271