Unsupervised Neural Text Simplification

Unsupervised Neural Text Simplification Sai Suryay Abhijit Mishraz Anirban Lahaz Parag Jainz Karthik Sankaranarayananz yIIT Kharagpur, India zIBM Research [email protected] fabhijimi,anirlaha,pajain34,[email protected] Abstract Gardent, 2014) (b) deletion/compression (Knight and Marcu, 2002; Clarke and Lapata, 2006; Fil- The paper presents a first attempt towards un- ippova and Strube, 2008; Rush et al., 2015; Filip- supervised neural text simplification that re- pova et al., 2015), and (c) paraphrasing (Specia, lies only on unlabeled text corpora. The core framework is composed of a shared encoder 2010; Coster and Kauchak, 2011; Wubben et al., and a pair of attentional-decoders, crucially as- 2012; Wang et al., 2016; Nisioi et al., 2017). sisted by discrimination-based losses and de- Most of the current TS systems require large- noising. The framework is trained using unla- scale parallel corpora for training (except for sys- beled text collected from en-Wikipedia dump. tems like Glavasˇ and Stajnerˇ (2015) that performs Our analysis (both quantitative and qualita- only lexical-simplification), which is a major im- tive involving human evaluators) on public pediment in scaling to newer languages, use-cases, test data shows that the proposed model can perform text-simplification at both lexical and domains and output styles for which such large- syntactic levels, competitive to existing super- scale parallel data do not exist. In fact, one of the vised methods. It also outperforms viable un- popular corpus for TS in English language, i.e., the supervised baselines. Adding a few labeled Wikipedia-SimpleWikipedia aligned dataset has pairs helps improve the performance further. been prone to noise (mis-aligned instances) and inadequacy (i.e., instances having non-simplified 1 Introduction targets) (Xu et al., 2015; Stajnerˇ et al., 2015), lead- Text Simplification (TS) deals with transforming ing to noisy supervised models (Wubben et al., the original text into simplified variants to increase 2012). While creation of better datasets (such as, its readability and understandability. TS is an im- Newsela by Xu et al.(2015)) can always help, we portant task in computational linguistics, and has explore the unsupervised learning paradigm which numerous use-cases in fields of education technol- can potentially work with unlabeled datasets that ogy, targeted content creation, language learning, are cheaper and easier to obtain. where producing variants of the text with vary- At the heart of the TS problem is the need for ing degree of simplicity is desired. TS systems preservation of language semantics with the goal are typically designed to simplify from two differ- of improving readability. From a neural-learning ent linguistic aspects: (a) Lexical aspect, by re- perspective, this entails a specially designed auto- placing complex words in the input with simpler encoder, which not only is capable of reconstruct- synonyms (Devlin, 1998; Candido Jr et al., 2009; ing the original input but also can additionally in- Yatskar et al., 2010; Biran et al., 2011; Glavasˇ troduce variations so that the auto-encoded out- and Stajnerˇ , 2015), and (b) Syntactic aspect, by put is a simplified version of the input. Intu- altering the inherent hierarchical structure of the itively, both of these can be learned by looking sentences (Chandrasekar and Srinivas, 1997; Can- at the structure and language patterns of a large ning and Tait, 1999; Siddharthan, 2006; Filippova amount of non-aligned complex and simple sen- and Strube, 2008; Brouwers et al., 2014). From tences (which are much cheaper to obtain com- the perspective of sentence construction, sentence pared to aligned parallel data). These motivations simplification can be thought to be a form of form the basis of our work. text-transformation that involves three major types Our approach relies only on two unlabeled text of operations such as (a) splitting (Siddharthan, corpora - one representing relatively simpler sen- 2006; Petersen and Ostendorf, 2007; Narayan and tences than the other (which we call complex). The crux of the (unsupervised) auto-encoding plus heuristics to simplify the text both lexically framework is a shared encoder and a pair of and syntactically. Most of these systems (Sid- attention-based decoders (one for each type of cor- dharthan, 2014) are separately targeted towards pus). The encoder attempts to produce semantics- lexical and syntactic simplification and are lim- preserving representations which can be acted ited to splitting and/or truncating sentences. For upon by the respective decoders (simple or com- paraphrasing based simplification, data-driven ap- plex) to generate the appropriate text output they proaches were proposed like phrase-based SMT are designed for. The framework is crucially sup- (Specia, 2010; Stajnerˇ et al., 2015) or their vari- ported by two kinds of losses: (1) adversarial loss ants (Coster and Kauchak, 2011; Xu et al., 2016), - to distinguish between the real or fake attention that combine heuristic and optimization strategies context vectors for the simple decoder, and (2) di- for better TS. Recently proposed TS systems are versification loss - to distinguish between atten- based on neural seq2seq architecture (Bahdanau tion context vectors of the simple decoder and the et al., 2014) which is modified for TS specific op- complex decoder. The first loss ensures that only erations (Wang et al., 2016; Nisioi et al., 2017). the aspects of semantics that are necessary for sim- While these systems produce state of the art re- plification are passed to the simple decoder in the sults on the popular Wikipedia dataset (Coster and form of the attention context vectors. The second Kauchak, 2011), they may not be generalizable be- loss, on the other hand, facilitates passing different cause of the noise and bias in the dataset (Xu et al., semantic aspects to the different decoders through 2015) and overfitting. Towards this, Stajnerˇ and their respective context vectors. Also we employ Nisioi(2018) showed that improved datasets and denoising in the auto-encoding setup for enabling minor model changes (such as using reduced vo- syntactic transformations. cabulary and enabling copy mechanism) help ob- The framework is trained using unlabeled text tain reasonable performance for both in-domain collected from Wikipedia (complex) and Simple and cross-domain TS. Wikipedia (simple). It attempts to perform sim- In the unsupervised paradigm, Paetzold and plification both lexically and syntactically unlike Specia(2016) proposed an unsupervised lexi- prevalent systems which mostly target them sep- cal simplification technique that replaces complex arately. We demonstrate the competitiveness of words in the input with simpler synonyms, which our unsupervised framework alongside supervised are extracted and disambiguated using word em- skylines through both automatic evaluation met- beddings. However, this work, unlike ours only rics and human evaluation studies. We also outper- addresses lexical simplification and cannot be triv- form another unsupervised baseline (Artetxe et al., ially extended for other forms of simplification 2018b), first proposed for neural machine transla- such as splitting and rephrasing. Other works re- tion. Further, we demonstrate that by leveraging lated to style transfer (Zhang et al., 2018; Shen a small amount of labeled parallel data, perfor- et al., 2017; Xu et al., 2018) typically look into mance can be improved further. Our code and a the problem of sentiment transformation and are new dataset containing partitioned unlabeled sets not motivated by the linguistic aspects of TS, and of simple and complex sentences is publicly avail- hence not comparable to our work. As far as we able1. know, ours is a first of its kind end-to-end solution for unsupervised TS. At this point, though super- 2 Related Work vised solutions perform better than unsupervised Text Simplification has often been discussed from ones, we believe unsupervised techniques should psychological and linguistic standpoints (L’Allier, be further explored since they hold greater poten- 1980; McNamara et al., 1996; Linderholm et al., tial with regards to scalability to various tasks. 2000). A heuristic-based system was first intro- duced by Chandrasekar and Srinivas(1997) which 3 Model Description induces rules for simplification automatically ex- Our system is built based on the encode-attend- tracted from annotated corpora. Canning and Tait decode style architecture (Bahdanau et al., 2014) (1999) proposed a modular system that uses NLP with both algorithmic and architectural changes tools such as morphological analyzer, POS tagger applied to the standard model. An input sequence 1 https://github.com/subramanyamdvss/UnsupNTS of word embeddings X = fx1; x2; : : : ; xng (ob- , � (� ,� ) , � (� ,� ) ��(�� ,��) ��(�� ,��) �� # �$ �% �& �# �$ �% �& Decoder Decoder �� "# �"$ �"% �"& �(# �($ �(% �(& � (� ) Discriminator ��,� � � (� ,� ) Encoder ��,�� Classifier ��,�(��) � (� ,� ) ��,�� # �$ �% �* Figure 1: System Architecture. Input sentences of any domain is encoded by E, and decoded by Gs, Gd. Dis- criminator D and classifier C tune the attention vectors for simplification. L represents loss functions. The figure only reveals one layer in E, Gs and Gd for simplicity. However, the model uses two layers of GRUs (Section3). tained after a standard look up operation on the Ast(X) and Adt(X) denote the context vectors embedding matrix), is passed

Unsupervised Neural Text Simplification

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support