QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering Michihiro Yasunaga Hongyu Ren Antoine Bosselut Percy Liang Jure Leskovec Stanford University {myasu,hyren,antoineb,pliang,jure}@cs.stanford.edu If it is not used for hair, a roundAbstract brush is an example of what? If it is not used for hair, a round brush is an example of what? A. hair brush B. bathroom C. art supplies* A. hair brush B. bathroom C. art supplies* D. showerThe problem E. hair salon of answering questions using D. shower E. hair salon QA contextknowledge from pre-trained language models QA context (LMs) and knowledgeQA context graphs (KGs) presents QA context two challenges: givenNode a QA context (question Node Question Choice Question Choice and answerEntity choice), methodsEntity need to (i) Entity Entity identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA hair hair hair hair context and KG.AtLocation Here we proposebrush a new AtLocation brush model, QA-GNN, which addresses the above AtLocation Answer AtLocation Answer challenges throughRelatedTo two key innovations: (i) RelatedTo round art round art relevancebrush scoring, where we usesupply LMs to esti- brush supply mate the importance of KG nodes relative to UsedFor UsedFor the given QA context,painting and (ii)UsedFor joint reasoning, painting UsedFor Knowledgewhere graph we connect the QA context and KG Knowledge graph to form a joint graph, and mutually update Figure 1: Given the QA context (question and answer their representations through graph-based message passing. We evaluate QA-GNN on the choice; purple box), we aim to derive the answer by CommonsenseQA and OpenBookQA datasets, performing joint reasoning over the language and the and show its improvement over existing LM Graph Connectionknowledge (§3.1) graph (green box).Dev Acc. Contextualization (§3.2) Dev Acc. and LM+KG models, as well as its capability to No edge between Z and KG nodes 74.81 No contextualization 75.56 perform interpretable and structured reasoning,Connect Z to all KG nodes 76.38 w/ contextual embedding 76.31 Connect Z to QA entity nodes (final system) 76.54 w/ relevance score (final system) 76.54 e.g., correctly handling negation in questions. be noisy (Bordes et al., 2013; Guu etw/ both al., 2015). 76.52 How to reason effectively with both sources of GNN Attention & Message (§3.3) Dev Acc. 1 Introduction knowledge remains an important open problem.GNN Layers (§3.3) Dev Acc. Node type, relation, score-aware (final system) 76.54 L = 3 75.53 - type-aware 75.41 L = 4 76.34 Question answering systems must be able to access Combining LMs and KGs for reasoning (hence-L = 5 (final system) 76.54 - relation-aware 75.61 L = 6 76.21 relevant knowledge and reason over it. Typically,- score-awareforth, LM+KG) presents75.56 two challenges:L = given 7 75.96 knowledge can be implicitly encoded in large a QA context (e.g., question and answer choices; Graph Connectionlanguage (§3.1) models (LMs)Dev pre-trained Acc. Relevance on unstructuredscoring (§3.2) Dev Acc.Figure Graph 1 purple Connection box), (§3.1) methods needDev Acc. to (i)Relevance identify scoring (§3.2) Dev Acc. No edgetext between (Petroni Z and KG et nodes al., 2019;74.11 BosselutNothing et al., 2019), or ex-75.15informativeNo edge between knowledge Z and KG nodes from a large74.81 KGNothing (green 75.56 Connectplicitly Z to all KG represented nodes in structured76.38 w/ knowledgecontextual embedding graphs76.31box);Connect and (ii) Z to captureall KG nodes the nuance of76.38 the QAw/ contextcontextual embedding 76.31 Connect Z to QA entity nodes (final) 76.54 w/ relevance score (final) 76.54 Connect Z to QA entity nodes (final) 76.54 w/ relevance score (final) 76.54 (KGs), such as Freebase (Bollackerw/ both et al., 2008)76.52and the structure of the KGs to performw/ both joint 76.52 and ConceptNet (Speer et al., 2017), where entities reasoning over these two sources of information. GNN Attention & Message (§3.3) Dev Acc. GNN Attention & Message (§3.3) Dev Acc. are represented as nodes and relationsGNN Layers (§3.3) betweenDev Acc. Previous works (Bao et al., 2016; Sun et al.,GNN 2018 Layers; (§3.3) Dev Acc. Node type, relation, score-aware (final) 76.54 L = 3 75.53 Node type, relation, score-aware (final) 76.54 L = 3 75.53 - type-awarethem as edges. Recently,75.11 pre-trainedL = 4 LMs have76.34 Lin et- type-aware al., 2019) retrieve a subgraph75.41 from theL = 4 KG 76.34 - relation-aware 75.23 L = 5 (final) 76.54 - relation-aware 75.61 L = 5 (final) 76.54 demonstrated remarkable success inL = 6 many question76.21 by taking topic entities (KG entities mentionedL = 6 in 76.21 - score-aware 75.15 - score-aware 75.56 answering tasks (Liu et al., 2019L = ;7 Raffel et al.75.96, the given QA context) and their few-hop neighbors.L = 7 75.96 2020). However, while LMs have a broad coverage However, this introduces many entity nodes that of knowledge, they do not empirically perform well are semantically irrelevant to the QA context, on structured reasoning (e.g., handling negation) especially when the number of topic entities or hops (Kassner and Schütze, 2020). On the other hand, increases. Additionally, existing LM+KG methods KGs are more suited for structured reasoning (Ren for reasoning (Lin et al., 2019; Wang et al., 2019a; et al., 2020; Ren and Leskovec, 2020) and enable Feng et al., 2020; Lv et al., 2020) treat the QA explainable predictions e.g., by providing reasoning context and KG as two separate modalities. They paths (Lin et al., 2019), but may lack coverage and individually apply LMs to the QA context and graph If it is not used for hair, a round brush is an example of what? If it is not used for hair, a round brush is an example of what? If it is not used for hair, a round brush A. hair brush B. bathroom C. art supplies* A. hair brush B. bathroom C. art supplies* A. hair brush B. bathroom C. art supplies D. shower E. hair salon D. shower E. hair salon D. shower E. hair salon Language Context Language Context Language Context Context Node Context Context Node Node Question Choice Entity Question Choice Question Entity Entity Entity Entity hair hair hair hair hair AtLocation brush AtLocation brush AtLocation AtLocation AtLocation Answer AtLocation Answer RelatedTo RelatedTo RelatedTo round art round round art brush supply brush brush supply UsedFor UsedFor UsedFor painting UsedFor painting painting UsedFor Knowledge Graph Knowledge Graph Knowledge Graph TextEncoder Language e.g. LM MLP Context [q; a] Plausibility Z Z KG Score Retrieval Contextual GNN (§3) Pooling LM QA context Encoding MLP [q; a] Probability Joint Z Z KG Graph (§3.1) score Retrieval Reasoning (§3.3) Relevance Pooling Scoring (§3.2) QA-GNN Figure 2: Overview of our approach. Given a QA context (z), we connect it with the retrieved KG to form a joint graph (working graph;§3.1), compute the relevance of each KG node conditioned on z (§3.2; node shading indicates the relevance score), and perform reasoning on the working graph (§3.3). neural networks (GNNs) to the KG, and do not dling negation and entity substitution in questions): mutually update or unify their representations. This it achieves 4.6% improvement over fine-tuned LMs separation might limit their capability to perform on questions with negation, while existing LM+KG structured reasoning, e.g., handling negation. models are +0.6% over fine-tuned LMs. We also Here we propose QA-GNN, an end-to-end show that one can extract reasoning processes from LM+KG model for question answering that QA-GNN in the form of general KG subgraphs, not addresses the above two challenges. We first encode just paths (Lin et al., 2019), suggesting a general the QA context using an LM, and retrieve a KG method for explaining model predictions. subgraph following prior works (Feng et al., 2020). Our QA-GNN has two key insights: (i) Relevance 2 Problem Statement scoring: Since the KG subgraph consists of all few-hop neighbors of the topic entities, some entity We aim to answer natural language questions using nodes are more relevant than others with respect to knowledge from a pre-trained LM and a structured the given QA context. We hence propose KG node KG. We use the term language model broadly to be relevance scoring: we score each entity on the KG any composition of two functions, fhead(fenc(x)), subgraph by concatenating the entity with the QA where fenc, the encoder, maps a textual input x to a LM context and calculating the likelihood using a pre- contextualized vector representation h , and fhead trained LM. This presents a general framework to uses this representation to perform a desired task weight information on the KG; (ii) Joint reasoning: (which we discuss in §3.2). In this work, we specifi- We design a joint graph representation of the QA cally use masked language models (e.g., RoBERTa) LM context and KG, where we explicitly view the QA as fenc, and let h denote the output representa- context as an additional node (QA context node) and tion of a [CLS] token that is prepended to the input connect it to the topic entities in the KG subgraph sequence x, unless otherwise noted. We define the as shown in Figure 1. This joint graph, which we knowledge graph as a multi-relational graph G = term the working graph, unifies the two modalities (V;E).

Load more