Generating Summaries with Topic Templates and Structured Convolutional Decoders
Total Page:16
File Type:pdf, Size:1020Kb
Generating Summaries with Topic Templates and Structured Convolutional Decoders Laura Perez-Beltrachini Yang Liu Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB flperez,[email protected] [email protected] Abstract In this work we propose a neural model which is guided by the topic structure of target summaries, Existing neural generation approaches create i.e., the way content is organized into sentences multi-sentence text as a single sequence. In and the type of content these sentences discuss. this paper we propose a structured convo- Our model consists of a structured decoder which lutional decoder that is guided by the con- tent structure of target summaries. We com- is trained to predict a sequence of sentence top- pare our model with existing sequential de- ics that should be discussed in the summary and to coders on three data sets representing differ- generate sentences based on these. We extend the ent domains. Automatic and human evalua- convolutional decoder of Gehring et al.(2017) so tion demonstrate that our summaries have bet- as to be aware of which topics to mention in each ter content coverage. sentence as well as their position in the target sum- mary. We argue that a decoder which explicitly 1 Introduction takes content structure into account could lead to Abstractive multi-document summarization aims better summaries and alleviate well-known issues at generating a coherent summary from a cluster with neural generation models being too general, of thematically related documents. Recently, Liu too brief, or simply incorrect. et al.(2018) proposed generating the lead sec- Although content structure has been largely tion of a Wikipedia article as a variant of multi- unexplored within neural text generation, it has document summarization and released WikiSum, been been recognized as useful for summariza- a large-scale summarization dataset which enables tion. Barzilay and Lee(2004) build a model of the training of neural models. the content structure of source documents and tar- Like most previous work on neural text gen- get summaries and use it to extract salient facts eration (Gardent et al., 2017; See et al., 2017; from the source. Sauper and Barzilay(2009) clus- Wiseman et al., 2017; Puduppully et al., 2019; ter texts by target topic and use a global optimi- Celikyilmaz et al., 2018; Liu et al., 2018; Perez- sation algorithm to select the best combination Beltrachini and Lapata, 2018; Marcheggiani and of facts from each cluster. Although these mod- Perez-Beltrachini, 2018), Liu et al.(2018) rep- els have shown good results in terms of content resent the target summaries as a single long se- selection, they cannot generate target summaries. quence, despite the fact that documents are orga- Our model is also related to the hierarchical de- arXiv:1906.04687v1 [cs.CL] 11 Jun 2019 nized into topically coherent text segments, ex- coding approaches of Li et al.(2015) and Tan hibiting a specific structure in terms of the con- et al.(2017). However, the former approach is tent they discuss (Barzilay and Lee, 2004). This auto-encoding the same inputs (our model carries is especially the case when generating text within out content selection for the summarization task), a specific domain where certain topics might be while the latter generates independent sentences. discussed in a specific order (Wray, 2002). For in- They also both rely on recurrent neural models, stance, the summary in Table1 is about a species while we use convolutional neural networks. To of damselfly; the second sentence describes the re- our knowledge this is the first hierarchical decoder gion where the species is found and the fourth the proposed for a non-recurrent architecture. type of habitat the species lives in. We would ex- To evaluate our model, we introduce WIKICAT- 1 pect other Animal Wikipedia summaries to exhibit SUM, a dataset derived from Liu et al.(2018) similar content organization. 1Our dataset and code are available at https:// agriocnemis zerafica is a species of damselfly in the family coenagrionidae. it is native to africa, where it is widespread across the central and western nations of the continent. it is known by the common name sahel wisp. this species occurs in swamps and pools in dry regions. there are no major threats but it may be affected by pollution and habitat loss to agriculture and development. agriocnemis zerafica EOT global distribution: the species is known from north-west uganda and sudan, through niger to mauritania and liberia: a larger sahelian range, i.e., in more arid zone than other african agriocnemis. record from angola unlikely. northeastern africa distribution: the species was listed by tsuda for sudan. [··· ]. EOP very small, about 20mm. orange tail. advised agriocnemis sp. id by kd dijkstra: [··· ] EOP same creature as previously posted as unknown, very small, about 20mm, over water, top view. advised probably agriocnemis, ”whisp” damselfly. EOP [··· ] EOP justification: this is a widespread species with no known major widespread threats that is unlikely to be declining fast enough to qualify for listing in a threatened category. it is therefore assessed as least concern. EOP the species has been recorded from northwest uganda and sudan, through niger to mauritania and [··· ] EOP the main threats to the species are habitat loss due to agriculture, urban development and drainage, as well as water pollution. Table 1: Summary (top) and input paragraphs (bottom) from the Animal development dataset (EOP/T is a special token indicating the end of paragraph/title). which consists of Wikipedia abstracts and source vs. Films) and thus concentrate on specific do- documents and is representative of three domains, mains. We associate Wikipedia articles with “do- namely Companies, Films, and Animals. In addi- mains” by querying the DBPedia knowledge-base. tion to differences in vocabulary and range of top- A training instance in our setting is a (domain- ics, these domains differ in terms of the linguistic specific) paragraph cluster (multi-document input) characteristics of the target summaries. We com- and the Wikipedia lead section (target summary). pare single sequence decoders and structured de- We derive sentence topic templates from sum- coders using ROUGE and a suite of new metrics maries for Animals, Films, and Companies and we propose in order to quantify the content ade- exploit these to guide the summariser. However, quacy of the generated summaries. We also show there is nothing inherent in our model that restricts that structured decoding improves content cover- its application to different domains. age based on human judgments. 3 Generation with Content Guidance 2 The Summarization Task Our model takes as input a set of ranked para- The Wikipedia lead section introduces the entity graphs P = fp1 ··· pjPjg which we concatenate (e.g., Country or Brazil) the article is about, high- to form a flat input sequence X = (x1 ··· xjX j) lighting important facts associated with it. Liu where xi is the i-th token. The output of the model et al.(2018) further assume that this lead section is a multi-sentence summary S = (s1; ··· ; sjSj) is a summary of multiple documents related to the where st denotes the t-th sentence. entity. Based on this premise, they propose the We adopt an encoder-decoder architecture multi-document summarization task of generating which makes use of convolutional neural networks the lead section from the set of documents cited (CNNs; Gehring et al. 2017). CNNs permit paral- in Wikipedia articles or returned by Google (using lel training (Gehring et al., 2017) and have shown article titles as queries). And create WikiSum, a good performance in abstractive summarization large-scale multi-document summarization dataset tasks (e.g., Narayan et al. 2018). Figure1 illus- with hundreds of thousands of instances. trates the architecture of our model. We use the Liu et al.(2018) focus on summarization from convolutional encoder of Gehring et al.(2017) to very long sequences. Their model first selects obtain a sequence of states (z1; ··· ; zjX j) given a subset of salient passages by ranking all para- an input sequence of tokens (x1; ··· ; xjX j).A graphs from the set of input documents (based on hierarchical convolutional decoder generates the their TF-IDF similarity with the title of the arti- target sentences (based on the encoder outputs). cle). The L best ranked paragraphs (up to 7.5k to- Specifically, a document-level decoder first gener- kens) are concatenated into a flat sequence and a ates sentence vectors (LSTM Document Decoder decoder-only architecture (Vaswani et al., 2017) is in Figure1), representing the content specification used to generate the summary. for each sentence that the model plans to decode. A sentence-level decoder (CNN Sentence Decoder We explicitly model the topic structure of sum- in Figure1) is then applied to generate an actual maries, under the assumption that documents sentence token-by-token. In the following we de- cover different topics about a given entity, while scribe the two decoders in more detail and how the summary covers the most salient ones and or- they are combined to generate summaries. ganizes them into a coherent multi-sentence text. We further assume that different lead summaries 3.1 Document-level Decoder are appropriate for different entities (e.g. Animals The document-level decoder builds a sequence of github.com/lauhaide/WikiCatSum. sentence representations (s1; ··· ; sjSj). For exam- representation by adding a sentence positional em- bedding. For each st the decoder incorporates <pad> <s>, 1, 1 1, 2 Aero, is, 1, 3 a, 1, 4 1, a, the representation of its position t. This explicitly informs the decoder which sentence in the target CNN Sentence document to decode for.