Arxiv:2010.06792V2 [Cs.CL] 18 Oct 2020 Intelligent Assistants, It Is Often Useful to Summa- Achieves Performance Boosts Over Existing Methods
Total Page:16
File Type:pdf, Size:1020Kb
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach Bowen Tan1, Lianhui Qin2, Eric P. Xing1;3, Zhiting Hu1;4 1Carnegie Mellon University, 2University of Washington, 3Petuum Inc., 4UC San Diego {btan2,epxing}@andrew.cmu.edu, [email protected], [email protected] Abstract news summarization corpora which have a small set of aspects (e.g., “sports”, “health” and other 4 Given a document and a target aspect (e.g., aspects in (Frermann and Klementiev, 2019)). As a topic of interest), aspect-based abstractive summarization attempts to generate a sum- a result, models trained on these data tend to be mary with respect to the aspect. Previous stud- restricted to the pre-defined set and fall short of ies usually assume a small pre-defined set of summarizing on other diverse aspects. aspects and fall short of summarizing on other This paper aims to go beyond pre-defined aspects diverse topics. In this work, we study summa- and enable summarization on arbitrary aspects rel- rizing on arbitrary aspects relevant to the doc- evant to the document. The arbitrary aspect may ument, which significantly expands the appli- not be explicitly mentioned but only implicitly re- cation of the task in practice. Due to the lack of supervision data, we develop a new weak lated to portions of the document, and it can be a supervision construction method and an aspect new aspect not seen during training. To this end, modeling scheme, both of which integrate rich we develop a new approach that integrates rich external knowledge sources such as Concept- external knowledge in both aspect modeling and Net and Wikipedia. Experiments show our ap- weak supervision construction. Specifically, we proach achieves performance boosts on sum- derive weak supervisions from a generic summa- marizing both real and synthetic documents rization corpus, where the ConceptNet knowledge given pre-defined or arbitrary aspects.1 graph (Speer et al., 2017) is used to substantially ex- 1 Introduction pand the aspect scope and enrich the supervisions. To assist summarization model to better understand Remarkable progresses have been made in gener- an aspect, especially a previously unseen one, we ating generic summaries of documents (Nallapati augment the model inputs with rich aspect-related et al., 2016; See et al., 2017; Narayan et al., 2018), information extracted from Wikipedia. partially due to the large amount of supervision Our approach is compatible with any neural data available. In practice, a document, such as a encoder-decoder architectures. In this work, we news article or a medical report, can span multiple use the large pre-trained BART model (Lewis et al., topics or aspects. To meet more specific infor- 2019) and fine-tune with the proposed method. Ex- mation need in applications such as personalized periments on real news articles show our approach arXiv:2010.06792v2 [cs.CL] 18 Oct 2020 intelligent assistants, it is often useful to summa- achieves performance boosts over existing methods. rize a document with regard to a given aspect, i.e., When adapting to the previous synthetic domain, aspect-based summarization. the BART model after fine-tuning with our weak Recent research has explored the problem of supervisions becomes substantially more data ef- aspect-based abstractive summarization (Krishna ficient, and outperforms previous best-performing and Srinivasan, 2018; Frermann and Klementiev, systems greatly using only 0.4% training examples. 2019). A key challenge of the task is the lack of di- rect supervision data containing documents paired 2 Related Work with multiple aspect-based summaries. Previous Aspect-based summarization as an instance of con- studies have created synthetic data from generic trollable text generation (Hu et al., 2017; Ficler and 1Code and data available at https://github.com/ Goldberg, 2017) offers extra controllability com- tanyuqian/aspect-based-summarization pared to generic summarization to ensure concise Document Document Extracted aspects Aspect: U.S. 3 NER ConceptNet {bees, Australia, U.S.} {insect, fly, colonoy, flower, TF-IDF ranking country, Great Barrier Reef, Oceania, koala, …} {…, dollar, Texas, Generic summary 1 technology, agriculture,…} Colony collapse disorder has killed ConceptNet related words: millions of bees. Scientists suspect a 2 {…, dollar, technology, …} virus may combine with other factors to collapse colonies. Disorder first Aspect: U.S. 4 cropped up in 2004, as bees were [U.S.]:[…, dollar, …]<s>[ ] imported from Australia. $15 billion Input $15 billion in U.S. crops each in U.S. crops each year dependent on Summary: $15 billion in U.S. crops each year Output year dependent on bees for bees for pollination. dependent on bees for pollination. Summarization model pollination. Figure 1: Illustration of our approach. Left: Constructing weak supervisions using ConceptNet, including (1) extracting aspects and (2) synthesizing aspect-based summaries. Right: Augmenting aspect information, including (3) identifying aspect related words in the document using Wikipedia and (4) feeding both aspect and related words into summarization model. summaries of interest. Early work has studied topic- ation setting. Automatic creation of data supervi- aware summarization in the multi-document set- sions also links our work to text data augmentation ting, with (typically small) datasets containing mul- in either heuristic-based (Wei and Zou, 2019) or tiple documents tagged with a relevant topic (Dang, automated manner (Sennrich et al., 2016; Hu et al., 2005; Conroy et al., 2006). For single-document 2019b). This work embeds rich structured knowl- aspect-based summarization, extractive methods edge in the data synthesis process. were used to extract related key sentences/words from the document (Lin and Hovy, 2000). Our 3 Approach work studies abstractive aspect-based summariza- Given a document and an aspect which can be a tion that generates summaries. Deutsch and Roth word or a phrase, the task aims to generate a sum- (2019) studied a sub-task of learning to select in- mary that concisely describes information in the formation in documents that should be included in document that is relevant to the aspect. We present the summary. Recent work (Frermann and Klemen- our approach that enables a neural summarization tiev, 2019; Krishna and Srinivasan, 2018) on the model to summarize on any aspects. The aspect can problem synthesized training data that use news be any words relevant to (but not necessarily occur- categories as the aspects and thus have a small ring in) the document. Our approach incorporates pre-defined set of aspects available. We aim to en- rich external knowledge sources, including Con- able summarization on any aspects, and develop ceptNet for enriching weak supervisions in training new weak supervisions by integrating rich external (sec 3.1) and Wikipedia for advising the document- knowledge. aspect relation to improve comprehension (sec 3.2). Aspect-based summarization has also been ex- Figure1 shows an overview of our approach. plored in the customer reviews domain (Hu and An advantage of our approach is that it is compat- Liu, 2004), where product aspects, customer sen- ible with any neural summarization architectures, timent, and sometimes textual summaries are ex- such as the popular encoder-decoders. This enables tracted (Popescu and Etzioni, 2007; Wang and Ling, us to make use of the large pre-trained network 2016; Angelidis and Lapata, 2018). Query-based BART (Lewis et al., 2019), on which we apply our summarization produces a summary in response approach for fine-tuning and improved inference. to a natural language query/question (Daumé III 3.1 Knowledge-enriched Weak Supervisions and Marcu, 2006; Liu et al., 2012; Xie et al., 2020) which differs from abstract aspects. Usually no direct supervision data is available. We Incorporating knowledge through weak supervi- start with a generic summarization corpus. Specif- sion has primarily been studied in classification or ically, in this work we use the CNN/DailyMail extraction problems (Hu et al., 2016; Peng et al., (Hermann et al., 2015) which consists of a set of 2016; Ratner et al., 2017). For example, (Hu et al., (document, summary) pairs. Our approach con- 2016) creates soft labels from a logical-rule en- structs weakly supervised examples by automati- hanced teacher model to train neural classifiers. cally extracting potential aspects and synthesizing This work explores weak supervisions in the gener- aspect-based summaries from the generic summary. Each resulting aspect and its aspect-based summary to select only salient words in the document for a are then paired with the document for training. concise summary. Thus, we first rank all words Extracting Aspects Given a generic summary, in the document by TF-IDF scores, and select top we want to extract as many aspects as possible so words that occur in the aspect’s Wikipedia page3. that the summarization model can see sufficient 4 Experiments examples during training. On the other hand, the Setup We construct weak supervisions from aspects must be relevant to the generic summary 100K out of 280K (doc, summary) pairs in the to facilitate synthesizing appropriate summary in training set of the CNN/DailyMail dataset (Her- the next step. To this end, we first apply a named mann et al., 2015). We use the CNN/DailyMail- entity recognition (NER) model2 to extract a set of pretrained BART (Lewis et al., 2019) provided by entities mentioned in the generic summary. These Fairseq (Ott et al., 2019) as our base summariza- entities serve as a seed set of aspects. We then tion model, and fine-tune with our approach im- augment the seed set by collecting each entity’s plemented using Texar (Hu et al., 2019a). We use neighbor concepts on the ConceptNet knowledge Adam optimizer with an initial learning rate of 3e-5, graph, as these concepts are semantically closely and beam search decoding with a width of 4. related to the entity (and thus the generic summary).