Using Local Knowledge Graph Construction to Scale Seq2seq Models to Multi-Document Inputs Angela Fan, Claire Gardent, Chloé Braud, Antoine Bordes
Total Page:16
File Type:pdf, Size:1020Kb
Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs Angela Fan, Claire Gardent, Chloé Braud, Antoine Bordes To cite this version: Angela Fan, Claire Gardent, Chloé Braud, Antoine Bordes. Using Local Knowledge Graph Construc- tion to Scale Seq2Seq Models to Multi-Document Inputs. 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Nov 2019, Hong Kong, China. 10.18653/v1/D19-1428. hal-02277063 HAL Id: hal-02277063 https://hal.archives-ouvertes.fr/hal-02277063 Submitted on 3 Sep 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Public Domain Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs Angela Fan Claire Gardent Chloe´ Braud Antoine Bordes FAIR / LORIA CNRS / LORIA CNRS / LORIA FAIR angelafan,[email protected] claire.gardent,[email protected] Abstract Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extrac- tively select portions of web text as input to Sequence-to-Sequence models using meth- ods such as TF-IDF ranking. We propose constructing a local graph structured knowl- edge base for each query, which compresses the web search information and reduces re- dundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text in- put, long-form question answering and multi- document summarization, feeding graph rep- resentations as input can achieve better perfor- mance than using retrieved text portions. 1 Introduction Effective information synthesis is at the core of many Natural Language Processing applica- Figure 1: Multi-Document Input to Linearized Graph tions, such as open-domain question answering Multi-document input resulting from web search queries are converted to a graph structured knowledge base using coref- and multi-document summarization. In such tasks, erence resolution and information extraction, then linearized a fundamental challenge is the ability to distill rel- into a sequence for Seq2Seq models. Color indicates coref- evant knowledge from hundreds of thousands of erence resolution. Node weight is indicated by circle radius and edge weight by line thickness. tokens of noisy and redundant input such as web- pages. Current approaches predominantly conduct TF-IDF-based information extraction to identify — small enough to be fully encoded by Seq2Seq key portions of the information, and then provide models. Such a method can be seen as merging this as sequential input to a Sequence-to-Sequence previous work on symbolic knowledge bases for (Seq2Seq) model. The sub-selected portions are information extraction with newer approaches us- limited to a few thousand words, as models often ing deep neural networks to encode knowledge. struggle to encode much longer sequences. Our approach, shown in Figure1, takes a query In this work, we propose a mechanism to re- and its corresponding multi-document web search structure free text into local knowledge graphs results and builds for each query a specific local that are then linearized into sequences, creating a knowledge graph. We present several modeling canonical form in which information is presented contributions to effectively encode the entire graph to models. By constructing a graph intermedi- as a sequence and attend to the most relevant por- ary, redundant information can be merged or dis- tions within this linearization. We demonstrate carded, producing substantially compressed input the effectiveness of this approach on two large- scale generative tasks with both long and noisy information synthesis setting. multi-document web input and paragraph length output: long-form question answering on the ELI5 2.2 Using Knowledge Bases dataset (Fan et al., 2019) and Wikipedia lead para- Previous work has explored various ways of rep- graph generation as a multi-document summariza- resenting information in knowledge bases (Bor- tion problem (Liu et al., 2018b). des et al., 2011) and improving these representa- tions (Chen et al., 2013). Knowledge bases have 2 Related Work been leveraged to improve performance on various Interest in generative sequence modeling has in- tasks, from coreference resolution (Ng and Cardie, tensified due to recent improvements (Peters et al., 2002) and question answering (Zheng, 2003; Bao 2018; Devlin et al., 2018; Radford et al., 2019), et al., 2014; Cui et al., 2017; Sun et al., 2018) to making the challenge of information synthesis signal processing (Buckner¨ et al., 2002). Various more relevant. In contrast to extractive tasks works convert text into Abstract Meaning Repre- which only require models to identify spans and sentations (Liu et al., 2018a) for domains such as can do so effectively on long documents by look- news (Vossen et al., 2015; Rospocher et al., 2016) ing at the paragraphs independently, generative se- and link nodes to large knowledge bases such as quence models must combine multiple pieces of DBPedia (Auer et al., 2007). Wities et al.(2017) evidence from long and noisy multi-document in- combine open information extraction with coref- put to generate correct and convincing responses. erence and lexical inference to build knowledge representations. They apply this to tweets and 2.1 Multi-Document Input analyze the accuracy on various aspects of graph construction. Das et al.(2018b) construct graphs Previous work in multi-document summarization from procedural text to track entity position to an- (Barzilay et al., 1999) applies various techniques swer when and if entities are created, destroyed, or to handle long input, including clustering to find moved. In contrast, we build graphs from substan- similar information (Honarpisheh et al., 2008), tially longer multi-document input and use them extractive methods to select relevant sentences for multi-sentence text generation. (Daume´ III and Marcu, 2002; Gillick and Favre, Recently, many have explored neural archi- 2009; Berg-Kirkpatrick et al., 2011; Di Fabbrizio tectures that can encode graph structured input et al., 2014; Bing et al., 2015; Cao et al., 2017) in- (Bruna et al., 2013; Kipf and Welling, 2016; Beck cluding maximal marginal relevance (Fabbri et al., et al., 2018; Zhou et al., 2018; Xu et al., 2018; Lai 2019), and incorporating queries (Baumel et al., et al., 2019). These models often represent graphs 2018) and graphs (Ganesan et al., 2010; Yasunaga as adjacency matrices to generalize architectures et al., 2017). However, there are few large scale such as convolutional networks to graph inputs. multi-document summarization datasets and many Rather than encoding a static knowledge graph or approaches have focused on extractive selection or leveraging external knowledge graphs, we build a hybrid extractive-abstractive models. In this work, local graph for each query and model these using we use graph construction to re-structure multi- standard Seq2Seq models. We leave the incorpo- document input for abstractive generation. ration of graph networks for future work. Advancements in question answering have ex- amined performance on datasets with multi- 3 Graph Construction document input, such as TriviaQA (Joshi et al., 2017). Various approaches have been proposed, We describe how symbolic graph representations including leveraging TF-IDF and bigram hash- of knowledge can be constructed from text. Our ing with an RNN to find relevant information approach assumes a multi-document input (such (Chen et al., 2017), models that score individual as web pages) resulting from the execution of a paragraphs for sub-selection (Clark and Gardner, query. The graph construction process (1) com- 2017), and nearest neighbor search with paragraph presses the web search input to a significantly re-ranking (Das et al., 2018a). However, these smaller size, allowing models to encode the en- approaches have been applied to extractive ques- tirety of the compression, and (2) reduces redun- tion answering tasks that require span identifica- dancy through merge operations, allowing relevant tion, rather than abstractive text generation in an information to be more easily identified. tions). Edges are merged similarly with existing edges between the same two nodes. Such merge operations allow strings such as the Nobel Prize and Nobel Prize to be represented as one node rather than separately. Similarly, coreference res- olution aids in merging — by identifying that Al- bert Einstein and He refer to the same entity and thus merging them, the construction of the graph reduces redundancy. The size of the graph can be modified by controlling which triples are added using TF-IDF overlap (see Figure2, step 4). TF- IDF overlap of the triple with the question can be used to determine if the triple contains relevant in- formation. This improves robustness to noisy web search input and helps filter entirely irrelevant por- Figure 2: Steps of Graph Construction. Color relates the tions, such as scraped HTML tags. document sentence used to produce the graph output. 4 Modeling Graphs as Sequences Text to Triples to Graph Graph construction Current models for text generation often use proceeds in several steps outlined in Figure2. Seq2Seq architectures such as the Transformer We apply Coreference Resolution (Clark and Man- (Vaswani et al., 2017). These models are de- ning, 2016a,b)1 and Open Information Extraction signed to encode sequences rather than graphs. (Stanovsky et al., 2018)2 to convert sentences into We describe now how to convert a graph into a a Triple of the form (subject, predicate, object).