Neural Keyphrase Generation with Layer-Wise Coverage Attention

Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention Wasi Uddin Ahmady∗, Xiao Baiz, Soomin Leez, Kai-Wei Changy yUniversity of California, Los Angeles, zYahoo Research yfwasiahmad,[email protected] zfxbai,[email protected] Abstract Title: [1] natural language processing technologies Natural language processing techniques have for developing a language learning environment . demonstrated promising results in keyphrase Abstract: [1] so far , computer assisted language generation. However, one of the major chal- learning ( call ) comes in many different flavors lenges in neural keyphrase generation is pro- . [1] our research work focuses on developing an cessing long documents using deep neural net- integrated e learning environment that allows im- works. Generally, documents are truncated be- proving language skills in specific contexts . [1] fore given as inputs to neural networks. Conse- integrated e learning environment means that it quently, the models may miss essential points is a web based solution . , for instance , web conveyed in the target document. To overcome browsers or email clients . [0] it should be accessi- this limitation, we propose SEG-Net, a neural ble . [1] natural language processing ( nlp ) forms keyphrase generation model that is composed the technological basis for developing such a learn- of two major components, (1) a selector that ing framework . [0] the paper gives an overview selects the salient sentences in a document and . [0] therefore , on the one hand , it explains cre- (2) an extractor-generator that jointly extracts ation . [0] on the other hand , it describes existing and generates keyphrases from the selected nlp standards . [0] based on our requirements , the sentences. SEG-Net uses Transformer, a self- attentive architecture, as the basic building paper gives . [1] . necessary developments in e block with a novel layer-wise coverage atten- learning to keep in mind . tion to summarize most of the points discussed Present: natural language processing; computer in the document. The experimental results on assisted language learning; integrated e learning seven keyphrase generation benchmarks from Absent: semantic web technologies; learning of scientific and web documents demonstrate that foreign languages SEG-Net outperforms the state-of-the-art neu- Figure 1: Example of a document with present and ab- ral generative methods by a large margin. sent keyphrases. The value (0/1) in brackets ([]) repre- sent sentence salience label. 1 Introduction Keyphrases are short pieces of text that summa- ument, while absent keyphrases are only semanti- rize the key points discussed in a document. They cally related and have partial or no overlap to the are useful for many natural language processing target document. We provide an example of a target and information retrieval tasks (Wilson et al., 2005; document and its keyphrases in Figure1. Berend, 2011; Tang et al., 2017; Subramanian et al., In recent years, the neural sequence-to-sequence 2018; Zhang et al., 2017b; Wan and Xiao, 2008; (Seq2Seq) framework (Sutskever et al., 2014) Jones and Staveley, 1999; Kim et al., 2013; Hulth has become the fundamental building block in and Megyesi, 2006; Hammouda et al., 2005; Wu keyphrase generation models. Most of the existing and Bolivar, 2008; Dave and Varma, 2010). In the approaches (Meng et al., 2017; Chen et al., 2018; automatic keyphrase generation task, the input is a Yuan et al., 2020; Chen et al., 2019b) adopt the document, and the output is a set of keyphrases that Seq2Seq framework with attention (Luong et al., can be categorized as present or absent keyphrases. 2015; Bahdanau et al., 2014) and copy mechanism Present keyphrases appear exactly in the target doc- (See et al., 2017; Gu et al., 2016). However, present ∗Work done during internship at Yahoo Research. phrases indicate the indispensable segments of a 1389 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 1389–1404 August 1–6, 2021. ©2021 Association for Computational Linguistics target document. Emphasizing on those segments to show that selecting salient sentences improve improves document understanding that can lead a present keyphrase extraction and the layer-wise model to coherent absent phrase generation. This coverage attention and facilitates absent keyphrase motivates to jointly model keyphrase extraction and generation. Our novel contributions are as follows. generation (Chen et al., 2019a). 1. SEG-Net that identifies the salient sentences To generate a comprehensive set of keyphrases, in the target document first and then use them reading the complete target document is necessary. to generate a set of keyphrases. However, to the best of our knowledge, none of the 2. A layer-wise coverage attention. previous neural methods read the full content of 2 Problem Definition a document as it can be thousands of words long. Existing models truncate the target document; take Keyphrase generation task is defined as given a the first few hundred words as input and ignore text document x, generate a set of keyphrases the rest of the document that may contain salient K = fk1; k2; : : : ; kjKjg where the document i information. On the contrary, a significant frac- x = [x1; : : : ; xjxj] and each keyphrase k = i i tion of a long document may not associate with the [k1; : : : ; kjkij] is a sequence of words. A text keyphrases. Presumably, selecting the salient seg- document can be split into a list of sentences, 1 2 jSj i ments from the target document and then predicting Sx = [sx; sx; : : : ; sx ] where each sentence sx = the keyphrases from them would be effective. [xj; : : : ; xj+jsi|−1] is a consecutive subsequence of To address the aforementioned challenges, in the document x with begin index j ≤ jxj and end i this paper, we propose SEG-Net (stands for Select, index (j + js j) < jxj. In literature, keyphrases Extract, and Generate) that has two major compo- are categorized into two types, present and ab- nents, (1) a sentence-selector that selects the salient sent. A present keyphrase is a consecutive subse- sentences in a document, and (2) an extractor- quence of the document, while an absent keyphrase generator that predicts the present keyphrases and is not. However, an absent keyphrase may have generates the absent keyphrases jointly. The moti- a partial overlapping with the document’s word sequence. We denote the sets of present and ab- vation to design the sentence-selector is to decom- p 1 2 jK j pose a long target document into a list of sentences, sent keyphrases as Kp = fkp; kp; : : : ; kp g and a and identify the salient ones for keyphrase gener- 1 2 jK j Ka = fka; ka; : : : ; ka g, respectively. Hence, we ation. We consider a sentence as salient if it con- can express a set of keyphrases as K = Kp [Ka. tains present keyphrases or overlaps with absent SEG-Net decomposes the keyphrase generation keyphrases. As shown in Figure1, we split the task into three sub-tasks. We define them below. document into a list of sentences and classify them with salient and non-salient labels. A similar notion Task 1 (Salient Sentence Selection). Given a list of sentences Sx, predict a binary label (0=1) for is adopted in prior works on text summarization i (Chen and Bansal, 2018; Lebanoff et al., 2019) and each sentence sx. The label 1 indicates that the sen- question answering (Min et al., 2018). We employ tence contains a present keyphrase or overlaps with an absent keyphrase. The output of the selector is Transformer (Vaswani et al., 2017) as the backbone sal of the extractor-generator in SEG-Net. a list of salient sentences Sx . We equip the extractor-generator with a novel Task 2 (Present Keyphrase Extraction ). Given sal layer-wise coverage attention such that the gener- Sx as a concatenated sequence of words, predict ated keyphrases summarize the entire target doc- a label (B/I/O) for each word that indicates if it is a ument. The layer-wise coverage attention keeps constituent of a present keyphrase. track of the target document segments that are covered by previously generated phrases to guide Task 3 (Absent Keyphrase Generation). Given sal the self-attention mechanism in Transformer while Sx as a concatenated sequence of words, gen- attending the encoded target document in future erate a concatenated sequence of keyphrases in a generation steps. We evaluate SEG-Net on five sequence-to-sequence fashion. benchmarks from scientific articles and two bench- 3 SEG-Net for Keyphrase Generation marks from web documents to demonstrate its ef- fectiveness over the state-of-the-art neural gener- Our proposed model, SEG-Net jointly learns to ex- ative methods. We perform ablation and analysis tract and generate present and absent keyphrases 1390 from the salient sentences in a target document. The key advantage of SEG-Net is the maximal utilization of the information from the input text in order to generate a set of keyphrases that summarize all the key points in the target document. SEG-Net consists of a sentence-selector and an extractor-generator. The sentence-selector identifies the salient sentences from the target document (Task1) that are fed to the extractor-generator to predict both the present and absent keyphrases (Task2,3). We detail them in this section. 3.1 Embedding Layer The embedding layer maps each word in an input sequence to a low-dimensional vector space. We train three embedding matrices, We;Wpos; and Wseg that convert a word, its absolute position, and segment index into vector representations of size dmodel. The segment index of a word indicates the index of the sentence that it belongs to. In addition, we obtain a character-level embedding for each word using Convolutional Neural Networks (CNN) (Kim, 2014a).

Neural Keyphrase Generation with Layer-Wise Coverage Attention

SCHEDULE + AGENDA *Agenda Is Not Final

Mathnews 142-4

LA IDENTIDAD Π. IDENTIDADES EN EL LABORATORIO Lluís Sallés Diego

798 000 Véhicules Flashés En Trois

Arxiv:2008.01739V2 [Cs.CL] 4 Jun 2021 Are Useful for Many Natural Language Processing Target Document