Optimizing Statistical Machine Translation for Text Simpliﬁcation

Optimizing Statistical Machine Translation for Text Simplification Wei Xu1, Courtney Napoles2, Ellie Pavlick1, Quanze Chen1 and Chris Callison-Burch1 1 Computer and Information Science Department University of Pennsylvania xwe, epavlick, cquanze, ccb @seas.upenn.edu { 2 Department of Computer} Science Johns Hopkins University [email protected] Abstract ations: splitting, deletion and paraphrasing (Feng, 2008). The splitting operation decomposes a long Most recent sentence simplification systems sentence into a sequence of shorter sentences. Dele- use basic machine translation models to learn tion removes less important parts of a sentence. The lexical and syntactic paraphrases from a manually simplified parallel corpus. These meth- paraphrasing operation includes reordering, lexical ods are limited by the quality and quantity of substitutions and syntactic transformations. While manually simplified corpora, which are expen- sentence splitting (Siddharthan, 2006; Petersen and sive to build. In this paper, we conduct an in- Ostendorf, 2007; Narayan and Gardent, 2014; An- depth adaptation of statistical machine trans- grosh et al., 2014) and deletion (Knight and Marcu lation to perform text simplification, taking 2002; Clarke and Lapata 2006; Filippova and Strube advantage of large-scale paraphrases learned 2008; Filippova et al. 2015; Rush et al. 2015; and from bilingual texts and a small amount of others) have been intensively studied, there has been manual simplifications with multiple references. Our work is the first to design auto- considerably less research on developing new para- matic metrics that are effective for tuning and phrasing models for text simplification — most pre- evaluating simplification systems, which will vious work has used off-the-shelf statistical machine facilitate iterative development for this task. translation (SMT) technology and achieved reasonable results (Coster and Kauchak, 2011a,b; Wubben et al., 2012; Stajnerˇ et al., 2015). However, they have either treated the judgment technology as a black 1 Introduction (Coster and Kauchak, 2011a,b; Narayan and Gar- dent, 2014; Angrosh et al., 2014; Stajnerˇ et al., 2015) The goal of text simplification is to rewrite an input or they have been limited to modifying only one as- text so that the output is more readable. Text sim- pect of it, such as the translation model (Zhu et al., plification has applications for reducing input com- 2010; Woodsend and Lapata, 2011) or the reranking plexity for natural language processing (Siddharthan component (Wubben et al., 2012). et al., 2004; Miwa et al., 2010; Chen et al., 2012b) and providing reading aids for people with lim- In this paper, we present a complete adaptation ited language skills (Petersen and Ostendorf, 2007; of a syntax-based machine translation framework to Watanabe et al., 2009; Allen, 2009; De Belder and perform simplification. Our methodology poses text Moens, 2010; Siddharthan and Katsos, 2010) or lan- simplification as a paraphrasing problem: given an guage impairments such as dyslexia (Rello et al., input text, rewrite it subject to the constraints that 2013), autism (Evans et al., 2014), and aphasia (Car- the output should be simpler than the input, while roll et al., 1999). preserving as much meaning of the input as pos- It is widely accepted that sentence simplification sible, and maintaining the well-formedness of the can be implemented by three major types of oper- text. Going beyond previous work, we make di- rect modifications to four key components in the et al., 2012). Previously researchers attempted some SMT pipeline:1 1) two novel simplification-specific quick fixes by adding phrasal deletion rules (Coster tunable metrics; 2) large-scale paraphrase rules au- and Kauchak, 2011a) or reranking n-best outputs tomatically derived from bilingual parallel corpora, based on their dissimilarity to the input (Wubben which are more naturally and abundantly available et al., 2012). In contrast, we exploit data with im- than manually simplified texts; 3) rich rule-level proved quality and enlarged quantity, namely, large- simplification features; and 4) multiple reference scale paraphrase rules automatically derived from simplifications collected via crowdsourcing for tun- bilingual corpora and a small amount of manual ing and evaluation. In particular, we report the simplification data with multiple references for tun- first study that shows promising correlations of au- ing parameters. We then systematically design new tomatic metrics with human evaluation. Our work tuning metrics and rich simplification-specific fea- answers the call made in a recent TACL paper (Xu tures into a syntactic machine translation model to et al., 2015) to address problems in current simplifi- enforce optimization towards simplicity. This ap- cation research — we amend human evaluation cri- proach achieves better simplification performance teria, develop automatic metrics, and generate an without relying on a manually simplified corpus to improved multiple reference dataset. learn paraphrase rules, which is important given the Our work is primarily focused on lexical simplifi- fact that Simple Wikipedia and the newly released cation (rewriting words or phrases with simpler ver- Newsela simplification corpus (Xu et al., 2015) are sions), and to a lesser extent on syntactic rewrite only available for English. rules that simplify the input. It largely ignores the Second, previous evaluation used in the simplifi- important subtasks of sentence splitting and dele- cation literature is uninformative and not compara- tion. Our focus on lexical simplification does not af- ble across models due to the complications between fect the generality of the presented work, since dele- the three different operations of paraphrasing, deletion or sentence splitting could be applied as pre- or tion, and splitting. This, combined with the unreli- post-processing steps. able quality of Simple Wikipedia as a gold reference for evaluation, has been the bottleneck for develop- 2 Background ing automatic metrics. There exist only a few studies (Wubben et al., 2012; Stajnerˇ et al., 2014) on au- Xu et al. (2015) laid out a series of problems that tomatic simplification evaluation using existing MT are present in current text simplification research, metrics which show limited correlation with human and argued that we should deviate from the previous assessments. In this paper, we restrict ourselves to state-of-the-art benchmarking setup. lexical simplification, where we believe MT-derived First, the Simple English Wikipedia data has dom- evaluation metrics can best be deployed. Our newly inated simplification research since 2010 (Zhu et al., proposed metric is the first automatic metric that 2010; Siddharthan, 2014), and is used together with shows reasonable correlation with human evalua- Standard English Wikipedia to create parallel text tion on the text simplification task. We also intro- to train MT-based simplification systems. How- duce multiple references to make automatic evalua- ever, recent studies (Xu et al., 2015; Amancio and tion feasible. ˇ Specia, 2014; Hwang et al., 2015; Stajner et al., The most related work to ours is that of Gan- 2015) showed that the parallel Wikipedia simplifi- itkevitch et al. (2013) on sentence compression, in cation corpus contains a large proportion of inade- which compression of word and sentence lengths quate (not much simpler) or inaccurate (not aligned can be more straightforwardly implemented in fea- or only partially aligned) simplifications. It is one of tures and the objective function in the SMT frame- the leading reasons that existing simplification sys- work. We want to stress that sentence simplifica- tems struggle to generate simplifying paraphrases tion is not a simple extension of sentence compres- and leave the input sentences unchanged (Wubben sion, but is a much more complicated task, primarily 1Our code and data are made available at: https:// because high-quality data is much harder to obtain github.com/cocoxu/simplification/ and the solution space is more constrained by word choice and grammar. Our work is also related to FKBLEU other tunable metrics designed to be very simple and Our first metric combines a previously proposed light-weight to ensure fast repeated computation for metric for paraphrase generation, iBLEU (Sun and tuning bilingual translation models (Liu et al., 2010; Zhou, 2012), and the widely used readability metric, Chen et al., 2012a). To the best of our knowledge, Flesch-Kincaid Index (Kincaid et al., 1975). iBLEU no tunable metric has been attempted for simplifica- is an extension of the BLEU metric to measure di- tion, except for BLEU. Nor do any evaluation met- versity as well as adequacy of the generated para- rics exist for simplification, although there are sev- phrase output. Given a candidate sentence O, human eral designed for other text-to-text generation tasks: references R and input text I, iBLEU is defined as: grammatical error correction (Napoles et al., 2015; iBLEU = ↵ BLEU(O, R) (1) Felice and Briscoe, 2015; Dahlmeier and Ng, 2012), ⇥ paraphrase generation (Chen and Dolan, 2011; Xu (1 ↵) BLEU(O, I). − − ⇥ et al., 2012; Sun and Zhou, 2012), and conversation where ↵ is a parameter taking balance between ade- generation (Galley et al., 2015). Another line of re- quacy and dissimilarity, and set to 0.9 empirically as lated work is lexical simplification that focuses on suggested by Sun and Zhou (2012). finding simpler synonyms of a given complex word Since the text simplification

Optimizing Statistical Machine Translation for Text Simpliﬁcation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support