Key-Value Memory Networks for Directly Reading Documents
Total Page:16
File Type:pdf, Size:1020Kb
Key-Value Memory Networks for Directly Reading Documents Alexander H. Miller1 Adam Fisch1 Jesse Dodge1,2 Amir-Hossein Karimi1 Antoine Bordes1 Jason Weston1 1Facebook AI Research, 770 Broadway, New York, NY, USA 2Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA {ahm,afisch,jessedodge,ahkarimi,abordes,jase}@fb.com Abstract can be used to query such databases (Berant et al., 2013; Kwiatkowski et al., 2013; Fader et al., 2014). Directly reading documents and being able to Unfortunately, KBs have intrinsic limitations such answer questions from them is an unsolved as their inevitable incompleteness and fixed schemas challenge. To avoid its inherent difficulty, ques- that cannot support all varieties of answers. Since tion answering (QA) has been directed towards information extraction (IE) (Craven et al., 2000), in- using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often tended to fill in missing information in KBs, is neither suffer from being too restrictive, as the schema accurate nor reliable enough, collections of raw tex- cannot support certain types of answers, and tual resources and documents such as Wikipedia will too sparse, e.g. Wikipedia contains much more always contain more information. As a result, even if information than Freebase. In this work we KBs can be satisfactory for closed-domain problems, introduce a new method, Key-Value Memory they are unlikely to scale up to answer general ques- Networks, that makes reading documents more tions on any topic. Starting from this observation, viable by utilizing different encodings in the ad- in this work we study the problem of answering by dressing and output stages of the memory read operation. To compare using KBs, information directly reading documents. extraction or Wikipedia documents directly in Retrieving answers directly from text is harder a single framework we construct an analysis than from KBs because information is far less struc- tool, WIKIMOVIES, a QA dataset that contains tured, is indirectly and ambiguously expressed, and raw text alongside a preprocessed KB, in the is usually scattered across multiple documents. This domain of movies. Our method reduces the explains why using a satisfactory KB—typically only gap between all three settings. It also achieves available in closed domains—is preferred over raw state-of-the-art results on the existing WIKIQA benchmark. text. We postulate that before trying to provide an- swers that are not in KBs, document-based QA sys- tems should first reach KB-based systems’ perfor- 1 Introduction mance in such closed domains, where clear compari- son and evaluation is possible. To this end, this paper Question answering (QA) has been a long stand- introduces WIKIMOVIES, a new analysis tool that ing research problem in natural language processing, allows for measuring the performance of QA systems with the first systems attempting to answer questions when the knowledge source is switched from a KB by directly reading documents (Voorhees and Tice, to unstructured documents. WIKIMOVIES contains 2000). The development of large-scale Knowledge 100k questions in the movie domain, and was de- ∼ Bases (KBs) such as Freebase (Bollacker et al., 2008) signed to be answerable by using either a perfect KB helped organize information into structured forms, (based on OMDb1), Wikipedia pages or an imper- prompting recent progress to focus on answering questions by converting them into logical forms that 1http://www.omdbapi.com 1400 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1400–1409, Austin, Texas, November 1-5, 2016. c 2016 Association for Computational Linguistics fect KB obtained through running an engineered IE matically (Craven et al., 2000; Carlson et al., 2010)— pipeline on those pages. not an easy problem. To bridge the gap between using a KB and read- For this reason, recent initiatives are returning to ing documents directly, we still lack appropriate ma- the original setting of directly answering from text us- chine learning algorithms. In this work we propose ing datasets like TRECQA (Wang et al., 2007), which the Key-Value Memory Network (KV-MemNN), a is based on classical TREC resources (Voorhees et al., new neural network architecture that generalizes the 1999), and WIKIQA (Yang et al., 2015), which is original Memory Network (Sukhbaatar et al., 2015) extracted from Wikipedia. Both benchmarks are or- and can work with either knowledge source. The ganized around the task of answer sentence selection, KV-MemNN performs QA by first storing facts in where a system must identify the sentence contain- a key-value structured memory before reasoning on ing the correct answer in a collection of documents, them in order to predict an answer. The memory but need not return the actual answer as a KB-based is designed so that the model learns to use keys to system would do. Unfortunately, these datasets are address relevant memories with respect to the ques- very small (hundreds of examples) and, because of tion, whose corresponding values are subsequently their answer selection setting, do not offer the option returned. This structure allows the model to encode to directly compare answering from a KB against prior knowledge for the considered task and to lever- answering from pure text. Using similar resources age possibly complex transforms between keys and as the dialog dataset of Dodge et al. (2016), our new values, while still being trained using standard back- benchmark WIKIMOVIES addresses both deficien- propagation via stochastic gradient descent. cies by providing a substantial corpus of question- Our experiments on WIKIMOVIES indicate that, answer pairs that can be answered by either using a thanks to its key-value memory, the KV-MemNN KB or a corresponding set of documents. consistently outperforms the original Memory Net- Even though standard pipeline QA systems like work, and reduces the gap between answering from AskMR (Banko et al., 2002) have been recently re- a human-annotated KB, from an automatically ex- visited (Tsai et al., 2015), the best published results tracted KB or from directly reading Wikipedia. We on TRECQA and WIKIQA have been obtained by confirm our findings on WIKIQA (Yang et al., either convolutional neural networks (Santos et al., 2015), another Wikipedia-based QA benchmark 2016; Yin and Schütze, 2015; Wang et al., 2016) where no KB is available, where we demonstrate or recurrent neural networks (Miao et al., 2015)— that KV-MemNN can reach state-of-the-art results— both usually with attention mechanisms inspired by surpassing the most recent attention-based neural (Bahdanau et al., 2015). In this work, we introduce network models. KV-MemNNs, a Memory Network model that oper- ates a symbolic memory structured as (key, value) 2 Related Work pairs. Such structured memory is not employed in Early QA systems were based on information re- any existing attention-based neural network architec- trieval and were designed to return snippets of text ture for QA. As we will show, it gives the model containing an answer (Voorhees and Tice, 2000; greater flexibility for encoding knowledge sources Banko et al., 2002), with limitations in terms of ques- and helps shrink the gap between directly reading tion complexity and response coverage. The creation documents and answering from a KB. of large-scale KBs (Auer et al., 2007; Bollacker et al., 2008) have led to the development of a new class of 3 Key-Value Memory Networks QA methods based on semantic parsing (Berant et al., The Key-Value Memory Network model is based 2013; Kwiatkowski et al., 2013; Fader et al., 2014; on the Memory Network (MemNNs) model (We- Yih et al., 2015) that can return precise answers to ston et al., 2015; Sukhbaatar et al., 2015) which complicated compositional questions. Due to the has proven useful for a variety of document read- sparsity of KB data, however, the main challenge ing and question answering tasks: for reading chil- shifts from finding answers to developing efficient dren’s books and answering questions about them information extraction methods to populate KBs auto- (Hill et al., 2016), for complex reasoning over sim- 1401 Figure 1: The Key-Value Memory Network model for question answering. See Section 3 for details. ulated stories (Weston et al., 2016) and for utilizing retrieved context and the most recent query are com- KBs to answer questions (Bordes et al., 2015). bined as features to predict a response from a list of Key-value paired memories are a generalization candidates. of the way context (e.g. knowledge bases or docu- Figure 1 illustrates the KV-MemNN model archi- ments to be read) are stored in memory. The lookup tecture. (addressing) stage is based on the key memory while In KV-MemNNs we define the memory slots as the reading stage (giving the returned result) uses the pairs of vectors (k1, v1) ..., (kM , vM ) and denote value memory. This gives both (i) greater flexibility the question x. The addressing and reading of the for the practitioner to encode prior knowledge about memory involves three steps: their task; and (ii) more effective power in the model Key Hashing: the question can be used to pre- via nontrivial transforms between key and value. The • key should be designed with features to help match select a small subset of the possibly large array. it to the question, while the value should be designed This is done using an inverted index that finds a with features to help match it to the response (an- subset (kh1 , vh1 ),..., (khN , vhN ) of memories swer). An important property of the model is that of size N where the key shares at least one word the entire model can be trained with key-value trans- with the question with frequency < F = 1000 forms while still using standard backpropagation via (to ignore stop words), following Dodge et al.