Discourse Level Factors for Sentence Deletion in Text Simplification
Total Page:16
File Type:pdf, Size:1020Kb
Discourse Level Factors for Sentence Deletion in Text Simplification Yang Zhong,*1 Chao Jiang,1 Wei Xu,1 Junyi Jessy Li2 1 Department of Computer Science and Engineering, The Ohio State University 2 Department of Linguistics, The University of Texas at Austin fzhong.536, jiang.1530, [email protected] [email protected] Abstract most prevalent, as simplified texts tend to be shorter (Pe- tersen and Ostendorf 2007; Drndarevic and Saggion 2012; This paper presents a data-driven study focusing on analyz- Woodsend and Lapata 2011). ing and predicting sentence deletion — a prevalent but under- This work aims to facilitate better understanding of sen- studied phenomenon in document simplification — on a large tence deletion in document-level text simplification. While English text simplification corpus. We inspect various docu- ment and discourse factors associated with sentence deletion, prior work analyzed sentence position and content in a cor- using a new manually annotated sentence alignment corpus pus study (Petersen and Ostendorf 2007), we hypothesize we collected. We reveal that professional editors utilize differ- and show that sentence deletion is driven partially by con- ent strategies to meet readability standards of elementary and textual, discourse-level information, in addition to the con- middle schools. To predict whether a sentence will be deleted tent within a sentence. during simplification to a certain level, we harness automat- We utilize the Newsela2 corpus (Xu, Callison-Burch, and ically aligned data to train a classification model. Evaluated Napoles 2015) which contains multiple versions of a docu- on our manually annotated data, our best models reached F1 ment rewritten by professional editors. This dataset allows scores of 65.2 and 59.7 for this task at the levels of elemen- us to compare sentence deletion strategies to meet different tary and middle school, respectively. We find that discourse readability requirements. Unlike prior work that often uses level factors contribute to the challenging task of predicting sentence deletion for simplification. automatically aligned data for analysis (c.f. Section 5), we manually aligned 50 articles of more than 5,000 sentences across three reading levels to provide a reliable ground truth 1 Introduction for the analysis and model evaluation in this paper.3 We find that sentence deletion happens very often at rates of Text simplification aims to rewrite an existing document to 17.2%–44.8% across reading levels, indicating that sentence be accessible to a broader audience (e.g., non-native speak- deletion prediction is an important task in text simplifica- ers, children, and individuals with language impairments) tion. Several characteristics of the original document, in- while remaining truthful in content. The simplification pro- cluding its length and topic, significantly influence sentence cess involves a variety of operations, including lexical and deletion. By analyzing the rhetorical structure (Mann and syntactic transformations, summarization, removal of diffi- Thompson 1988) of the original articles, we show that the cult content, and explicification (Siddharthan 2014). sentence deletion process is also informed by how a sentence While recent years saw a bloom in text simplification is situated in terms of its connections to neighboring sen- research (Xu et al. 2016; Narayan and Gardent 2016; Ni- tences, and its discourse salience within a document. In ad- sioi et al. 2017; Zhang and Lapata 2017; Vu et al. 2018; dition, we reveal that the use of discourse connectives within Sulem, Abend, and Rappoport 2018; Maddela and Xu 2018; a sentence also influence whether it will be deleted. Kriz et al. 2019) thanks to the development of large parallel To predict whether a sentence in the original article will corpora of original-to-simplified sentence pairs (Zhu, Bern- be deleted during simplification, we utilize noisy supervi- hard, and Gurevych 2010; Xu, Callison-Burch, and Napoles sion obtained from 886 automatically aligned articles with a 2015), most of the recent work is conducted at the sentence- total of 42,264 sentences. Our neural network model learns level, i.e., transducing each complex sentence to its simpli- from both the content of the sentence itself, as well as the fied version. discourse level factors we analyzed. Evaluated on our man- As a result, this line of work does not capture document- ually annotated data, our best model that utilizes Gaussian- level phenomena, among which sentence deletion is the 2Newsela is an educational tech company that provides reading Copyright c 2020, Association for the Advancement of Artificial material for children in school curricula. Intelligence (www.aaai.org). All rights reserved. 3To request our data, please first obtain access to the Newsela * Work started as an undergraduate student at UT Austin. corpus at: https://newsela.com/data/, then contact the authors. Figure 1: An example paragraph of the original news article (left) and its simplified version by professional editors for ele- mentary school students (right). The arrows signify the sentence alignment between the original and simplified documents. The second sentence in the original paragraph is aligned to three sentences after simplification. based feature vectorization achieved F1 scores of 65.2 and (1) Align paragraphs between articles by asking in-house 59.7 for this task across two different reading levels (elemen- annotators to manually verify and correct the automatic tary and middle school). We show that several of the factors, alignments generated by the CATS toolkit (Stajnerˇ et al. especially document characteristics, complements sentence 2018). Automatic methods are much more reliable for align- content in this challenging task. On the other hand, while ing paragraphs than sentences, given the longer contexts. We our analysis of rhetorical structure revealed interesting in- use this step to reduce the number of sentence pairs that need sights, encoding them as features does not further improve to be annotated. the model. To the best of our knowledge, this is the first data- (2) Collect human annotations for sentence alignment us- driven study that focuses on analyzing discourse-level fac- ing Figure Eight,4 a crowdsourcing platform. For every pos- tors and predicting sentence deletion on a large English text sible pair of sentences within the aligned paragraphs, we ask simplification corpus. 5 workers to classify it into three categories: meaning equiv- alent, partly overlapped, or mismatched.5 To ensure quality, 2 Data and Setup we embedded a hidden test question in every five questions We use the Newsela text simplification corpus (Xu, Callison- we asked, and removed workers whose accuracy dropped Burch, and Napoles 2015) of 936 news articles. Each arti- below 80% on the test questions. The inter-annotator agree- cle set consists of 4 or 5 simplified versions of the origi- ment is 0.807 by Cohen’s kappa (Artstein and Poesio 2008). nal article, ranging from grades 3-12 (corresponding to ages For this study, we consider a sentence in the original ar- 8-18). We group articles into three reading levels: origi- ticle deleted by the editor during simplification, if there is nal (grade 12), middle school (grades 6-8) and elementary no corresponding sentence labeled as meaning equivalent or school (grades 3-5). We use one version of article from partly overlap in the elementary or middle school levels. For each reading level, and study two document-level transfor- sentences that are shortened or split, we consider them as mations: original ! middle and original ! elementary. being kept. The final annotations for sentence alignment are We conduct analysis and learn to predict if a sentence aggregated by majority vote, then verified by the in-house would be dropped by professional editors when simplifying annotators (not the authors). text to the desired reading levels. To obtain labeled data for analysis and evaluation, we manually align sentences of 50 article sets. The resulting dataset is one of the largest manu- Automatic alignment. We align sentences between pairs ally annotated datasets for sentence alignment in simplifica- of articles based on the cosine similarity of their vector tion. Figure 1 shows a 3-sentence paragraph in the original representations. We use 700-dimensional sentence embed- article, aligned to the elementary school version. Sentences dings pretrained on 16GB English Wikipedia by Sent2Vec in the original article that cannot be mapped to any sentence (Pagliardini, Gupta, and Jaggi 2018), an unsupervised in a lower reading level are considered deleted. To train method that learns to compose sentence embeddings from models for sentence deletion prediction, we rely on noisy word vectors along with bigram character vectors. Auto- supervision from automatically aligned sentences from the matic aligned data includes a total of 42,264 sentences from rest of the corpus. the original article. 4https://figure-eight.com Manual alignment. Manual sentence alignment is con- 5We provided the following guideline: (i) two sentences are ducted on 50 sets of the articles (2,281 sentences in the equivalent if they convey the same meaning, though one sentence original version), using a combination of crowdsourcing and can be much shorter or simpler than the other sentence; (ii) two in-house analysis. The annotation process is designed to be sentences partially overlap if they share information in common but efficient, with rigorous quality control, and includes the fol- some important information differs/is missing; (iii) two sentences lowing steps: are mismatched, otherwise. Middle Elementary # of articles Middle Elementary Avg. (std) Avg. (std) Science 282 − 0.0380∗ − 0.0722∗ Health 92 − 0.0253 − 0.0033 automatic alignment 0.146 (±0.158) 0.272 (±0.172) Arts 79 − 0.0200 + 0.0014 manual alignment 0.172 (±0.154) 0.448 (±0.161) War 170 − 0.0192 − 0.0140 Kids 179 + 0.0029 + 0.0147 Table 1: Fraction of sentences deleted from the original arti- Money 160 + 0.0230∗ + 0.0169 cle to meet middle and elementary school reading standards.