Topic Augmented Generator for Abstractive Summarization Melissa Ailem1, Bowen Zhang1 and Fei Sha1,2 1 University of Southern California, Los Angeles, CA 1 {ailem, zhan734, feisha}@usc.edu 2
[email protected] Abstract mary by conditioning on the input text (and its rep- resentation through the encoder). Steady progress has been made in abstractive What kind of information can we introduce so summarization with attention-based sequence- that richer texts can appear in the summaries? In to-sequence learning models. In this paper, we propose a new decoder where the output sum- this paper, we describe how to combine topic mod- mary is generated by conditioning on both the eling with models for abstractive summarization. input text and the latent topics of the docu- Topics identified from topic modeling, such as La- ment. The latent topics, identified by a topic tent Dirichlet Allocation (LDA), capture corpus- model such as LDA, reveals more global se- level patterns of words co-occurrence and describe mantic information that can be used to bias documents with mixtures of semantically coher- the decoder to generate words. In particular, ent conceptual groups. Such usages of words and they enable the decoder to have access to ad- ditional word co-occurrence statistics captured concepts provide valuable inductive bias for super- at document corpus level. We empirically vali- vised models for language generation. date the advantage of the proposed approach We propose Topic Augmented Generator (TAG) on both the CNN/Daily Mail and the Wiki- for abstractive summarization where the popular How datasets. Concretely, we attain strongly pointer-generator based decoder is supplied with improved ROUGE scores when compared to latent topics of the input document (See et al., state-of-the-art models.