Learning Sentence Embeddings for Coherence Modelling and Beyond

Learning Sentence Embeddings for Coherence Modelling and Beyond Tanner Bohn Yining Hu Jinhang Zhang Charles X. Ling Department of Computer Science, Western University, London, ON, Canada ftbohn,yhu534,jzha337,[email protected] Abstract We present a novel and effective tech- nique for performing text coherence tasks while facilitating deeper insights into the data. Despite obtaining ever-increasing task performance, modern deep-learning approaches to NLP tasks often only provide users with the final network deci- How coherent is it? Suggest a coherent sentence order sion and no additional understanding of Algorithm: Algorithm: If dashed line is close to diagonal, high Take sentences in the order that the black dots the data. In this work, we show that a coherence. If far, low coherence. appear along the x-axis. new type of sentence embedding learned Result: Sentences 1 and 2 may be out of order, otherwise Result: through self-supervision can be applied it is quite close, with a coherence of 0.73. Suggested order: 2, 1, 4, 3, 5, 6. effectively to text coherence tasks while serving as a window through which deeper Figure 1: This paper abstract is analyzed by our understanding of the data can be ob- sentence position model trained on academic ab- tained. To produce these sentence em- stracts. The sentence encodings (predicted posi- beddings, we train a recurrent neural net- tion distributions) are shown below each sentence, work to take individual sentences and pre- where white is low probability and red is high. Po- dict their location in a document in the sition quantiles are ordered from left to right. The form of a distribution over locations. We first sentence, for example, is typical of the first demonstrate that these embeddings, com- sentence of abstracts as reflected in the high first- bined with simple visual heuristics, can be quantile value. For two text coherence tasks, we used to achieve performance competitive show the how the sentence encodings can easily with state-of-the-art on multiple text co- be used to solve them. The black dots indicate the herence tasks, outperforming more com- weighted average predicted position for each sen- plex and specialized approaches. Addi- tence. tionally, we demonstrate that these embeddings can provide insights useful to writ- to improve writing or understand how text should ers for improving writing quality and in- be reorganized for improved coherence. By im- forming document structuring, and assist- proving coherence, a text becomes easier to read ing readers in summarizing and locating and understand (Lapata and Barzilay, 2005), and information. in this work we particularly focus on measuring 1 Introduction coherence in terms of sentence ordering. Many recent approaches to NLP tasks make A goal of much of NLP research is to create tools use of end-to-end neural approaches which ex- that not only assist in completing tasks, but help hibit ever-increasing performance, but provide lit- gain insights into the text being analyzed. This is tle value to end-users beyond a classification or especially true of text coherence tasks, as users are regression value (Gong et al., 2016; Logeswaran likely to wonder where efforts should be focused et al., 2018; Cui et al., 2018). This leaves open the 151 Proceedings of Recent Advances in Natural Language Processing, pages 151–160, Varna, Bulgaria, Sep 2–4, 2019. https://doi.org/10.26615/978-954-452-056-4_018 question of whether we can achieve good perfor- of a sentence in a document given only the raw mance on NLP tasks while simultaneously provid- text. By training a neural network on this task, ing users with easily obtainable insights into the it is forced to learn how the location of a sen- data. This is precisely what the work in this paper tence in a structured text is related to its syntax aims to do in the context of coherence analysis, and semantics. As a neural model, we use a bi- by providing a tool with which users can quickly directional recurrent neural network, and train it and visually gain insight into structural informa- to take sentences and predict a discrete distribution about a text. To accomplish this, we rely on tion over possible locations in the source text. We the surprising importance of sentence location in demonstrate the effectiveness of predicted position many areas of natural language processing. If a distributions as an accurate way to assess docu- sentence does not appear to belong where it is lo- ment coherence by performing order discrimina- cated, it decreases the coherence and readability tion and sentence reordering of scientific abstracts. of the text (Lapata and Barzilay, 2005). If a sen- We also demonstrate a few types of insights that tence is located at the beginning of a document these embeddings make available to users that the or news article, it is very likely to be a part of a predicted location of a sentence in a news article high quality extractive summary (See et al., 2017). can be used to formulate an effective heuristic for The location of a sentence in a scientific abstract is extractive document summarization – outperform- also an informative indicator of its rhetorical pur- ing existing heuristic methods. pose (Teufel et al., 1999). It thus follows that the The primary contributions of this work are thus: knowledge of where a sentence should be located 1. We propose a novel self-supervised approach in a text is valuable. to learn sentence embeddings which works Tasks requiring knowledge of sentence position by learning to map sentences to a distribution – both relative to neighboring sentences and glob- over positions in a document (Section 2.2). ally – appear in text coherence modelling, with 2. We describe how these sentence embeddings two important tasks being order discrimination (is can be applied to established coherence tasks a sequence of sentences in the correct order?) and using simple algorithms amenable to visual sentence ordering (re-order a set of unordered sen- approximation (Section 2.3). tences). Traditional methods in this area make use of manual feature engineering and established the- 3. We demonstrate that these embeddings are ory behind coherence (Lapata and Barzilay, 2005; competitive at solving text coherence tasks Barzilay and Lapata, 2008; Grosz et al., 1995). (Section3) while quickly providing access to Modern deep-learning based approaches to these further insights into texts (Section4). tasks tend to revolve around taking raw words and directly predicting local (Li and Hovy, 2014; Chen 2 Predicted Position Distributions et al., 2016) or global (Cui et al., 2017; Li and Ju- 2.1 Overview rafsky, 2017) coherence scores or directly output By training a machine learning model to predict a coherent sentence ordering (Gong et al., 2016; the location of a sentence in a body of text (condi- Logeswaran et al., 2018; Cui et al., 2018). While tioned upon features not trivially indicative of po- new deep-learning based approaches in text coher- sition), we obtain a sentence position model such ence continue to achieve ever-increasing perfor- that sentences predicted to be at a particular loca- mance, their value in real-world applications is un- tion possess properties typical of sentences found dermined by the lack of actionable insights made at that position. For example, if a sentence is pre- available to users. dicted to be at the beginning of a news article, it In this paper, we introduce a self-supervised ap- should resemble an introductory sentence. proach for learning sentence embeddings which In the remainder of this section we describe our can be used effectively for text coherence tasks neural sentence position model and then discuss (Section3) while also facilitating deeper under- how it can be applied to text coherence tasks. standing of the data (Section4). Figure1 provides a taste of this, displaying the sentence embeddings 2.2 Neural Position Model for the abstract of this paper. The self-supervision The purpose of the position model is to produce task we employ is that of predicting the location sentence embeddings by predicting the position in 152 PPD sequence lem, we aim to determine what quantile of the doc- PPD(S ) 1 ument a sentence resides in. Notationally, we will PPD(S2) PPD(S ) = 3 refer to the number of quantiles as Q. We can in- PPD(S ) 4 terpret the class probabilities behind a prediction Sentence position model as a discrete distribution over positions for a sen- Softmax (sentence PPD) tence, providing us with a predicted position distribution (PPD). When Q = 2 for example, we are concat predicting whether a sentence is in the first or last LSTM LSTM LSTM half of a document. When Q = 4, we are predicting which quarter of the document it is in. In LSTM LSTM LSTM Figure2 is a visualization of the neural architec- ture which produces PPDs of Q = 10. concat LSTM LSTM LSTM 2.2.2 Features Used The sentence position model receives an input sen- LSTM LSTM LSTM tence as a sequence of word encodings and out- puts a single vector of dimension Q. Sentences fText(w0) fText(w1) fText(w2) fText(article) fText(article) fText(article) are fed into the BiLSTM one at a time as a se- fText(w0)-fText(article) fText(w1)-fText(article) fText(w2)-fText(article) quence of word encodings, where the encoding for each word consists of the concatenation of: (1) a pretrained word embedding, (2) the average of Figure 2: Illustration of the sentence position the pretrained word embedding for the entire doc- model, consisting of stacked BiLSTMs. Sentences ument (which is constant for all words in a docu- from a text are individually fed into the model ment), and (3) the difference of the first two com- to produce a PPD sequence.

Learning Sentence Embeddings for Coherence Modelling and Beyond

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support