Connecting the Dots: a Knowledgeable Path Generator for Commonsense Question Answering

Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering Peifeng Wang1;3, Nanyun Peng1;2;3, Filip Ilievski3, Pedro Szekely1;3, Xiang Ren1;3 1Department of Computer Science, University of Southern California 2Department of Computer Science, University of California, Los Angeles 3Information Sciences Institute, University of Southern California fpeifengw,[email protected], [email protected] filievski,[email protected] Abstract geological_feature IsA cave Commonsense question answering (QA) re- fungus AtLocation moist_place quires background knowledge which is not ex- AtLocation plicitly stated in a given context. Prior works CapableOf AtLocation use commonsense knowledge graphs (KGs) to grow UsedFor water obtain this knowledge for reasoning. How- KG ever, relying entirely on these KGs may not Q: In what geological feature will you find fungus growing? suffice, considering their limited coverage and A: shower stall B: toenails C: basement D: forest E: cave the contextual dependence of their knowledge. In this paper, we augment a general common- Figure 1: Our path generator learns to connect the ques- sense QA framework with a knowledgeable tion entities (in red) and choice entities (in blue). The path generator. By extrapolating over exist- dashed arrow indicates a missing link in a static KG. ing paths in a KG with a state-of-the-art language model, our generator learns to connect Huang et al., 2019), it is unclear whether this is due a pair of entities in text with a dynamic, and to commonsense reasoning or to capturing spuri- potentially novel, multi-hop relational path. ous correlations in the data (Niven and Kao, 2019). Such paths can provide structured evidence for Pre-trained LMs may answer a question correctly solving commonsense questions without fine- for wrong reasons, making them highly uninter- tuning the path generator. Experiments on two datasets show the superiority of our method pretable (Mitra et al., 2019). over previous works which fully rely on knowl- Alternatively, a set of systems retrieve external edge from KGs (with up to 6% improvement in knowledge either from large text corpora or knowl- accuracy), across various amounts of training edge graphs (KGs). A corpus, however, might not data. Further evaluation suggests that the gen- be an ideal source of commonsense knowledge, erated paths are typically interpretable, novel, as such knowledge is seldom stated explicitly in 1 and relevant to the task. text (Storks et al., 2019). In contrast, common- 1 Introduction sense KGs, like ConceptNet (Speer et al., 2017) and ATOMIC (Sap et al., 2019), provide structured Solving commonsense QA tasks requires filling evidence about the relevant entities, thus enabling gaps with external knowledge. For instance, given effective reasoning and higher interpretability. Ex- the multiple-choice question in Figure1, a system isting systems retrieve knowledge from a KG in the needs to know that fungus grows in moist envi- form of: triplets (Mihaylov and Frank, 2018), multi- ronments, such as caves, and that a cave is a type hop paths (Lin et al., 2019; Bauer et al., 2018), or of geological feature. Such commonsense knowl- subgraphs (Kapanipathi et al., 2019). edge is obvious for humans but most existing QA Despite the aforementioned benefits, exploiting systems do not have it or cannot reason with it. these KGs poses the following challenges. Firstly, Although recent advances in pre-trained lan- as KGs are known to suffer from sparsity (Li guage models (LMs) have resulted in impres- et al., 2016), they might not contain the knowledge sive performance on commonsense-related bench- needed to fill the gaps between the question and the marks (Zellers et al., 2018; Bhagavatula et al., 2019; answer. For example, a missing link (cave, IsA, ge- 1The code is available at https://github.com/ ological feature) in Figure1 might prevent the QA wangpf3/Commonsense-Path-Generator. system from choosing the correct answer. Recent 4129 Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4129–4140 November 16 - 20, 2020. c 2020 Association for Computational Linguistics Context Question ; Choice work on commonsense KG completion (Li et al., Encoder 2016; Bosselut et al., 2019; Bosselut and Choi, Score Reasoning 2019) is limited to predicting the tail of a statement f(q, a) with known head and relation, or a single-hop re- Knowledge lation between entities. Secondly, due to the large Encoder size and heterogeneity of modern KGs, contextualization—i.e., identifying a set of KG facts which Knowledge Paths are relevant or needed to answer a question—is Figure 2: Our KG-augmented QA Framework. The rea- also difficult (Fadnis et al., 2019). Simply retriev- soning module leverages both the unstructured context and ing all paths could introduce noisy information and structured knowledge to answer a question. potentially harm reasoning. To address this gap between LMs and KGs, we both automatic and human evaluation. propose a knowledgeable path generator (PG) that To summarize, our key contributions are: generalizes over the facts stored in a KG, rather 1. We propose a method to generate task-relevant than only retrieving them. We call our method neu- knowledge paths that may not exist in the orig- ral KG due to its neural generalization over struc- inal KG, thus addressing the contextualization tured KGs, and, in contrast, we use the term static and sparsity challenges of KGs. KG for methods which rely exclusively on existing 2. We design and implement a framework with facts in a KG. Our PG connects a pair of ques- three variants of our PG, to understand the tion and answer entities with a (novel) multi-hop role of local and global graph information. path, which may not exist in the KG, allowing for missing facts like (cave, IsA, geological feature) in 3. Extensive experiments on two benchmark Figure1 to be considered during inference. datasets demonstrate the effectiveness of our To learn such a generator, we: (1) sample a set of method compared to previous methods, as random walk instances from a static commonsense well as its robustness to limited training data. KG based on rules and constraints for informa- tiveness and relevance (§3.1); (2) fine-tune a pre- 2 Preliminaries trained language model — GPT-2 (Radford et al., Our multiple-choice commonsense QA setup fol- 2019) on the sampled paths (§3.2). By doing so, we lows prior work (Talmor et al., 2018; Mihaylov transfer the rich knowledge encoded in GPT-2 to et al., 2018; Bisk et al., 2020): given a question our PG. This is expected to both enhance the gener- q, a system selects exactly one of the choices a alization ability of the PG and combat the sparsity as an answer. To experiment with contextualized of KGs. Also, by generating high-quality missing background knowledge, we adopt a general frame- links between the question and answer entities, we work (Figure2) consisting of a context module, a contextualize the task with relevant commonsense knowledge module and a reasoning module. The knowledge. To understand the impact of our multi- context module encodes both the question q and a hop PG on downstream commonsense QA tasks, choice a as unstructured evidence, while the knowl- we integrate the PG in an augmented version of a edge module encodes external facts as structured general QA framework (§3.3). evidence. Both the unstructured and the structured We run experiments on two benchmark datasets evidence are fed to the reasoning module, which CommonsenseQA (Talmor et al., 2018) and Open- produces a score for a question-choice pair. The BookQA (Mihaylov et al., 2018). The results show choice with a highest score would be the predicted that out method performs better than previous sys- answer. Next, we introduce each module in detail. tems augmented with static KGs by up to 6% in ac- Context Module We concatenate a question q and curacy, which also reveals its potential as a plug-in one of its choices a with a special token, and feed module for various datasets and as a vital comple- the sequence into a contextual encoder. This en- ment to existing KG structures. In the low-resource coder generates an embedding c, which serves as setting, the accuracy gain over the baselines grows an unstructured evidence to our system. As com- as the training data decreases, indicating a larger monly done for textual input, we consider a bidi- inductive bias of our generator. We also assess the rectional pre-trained language model (Devlin et al., quality and interpretability of our paths through 2018; Liu et al., 2019) as a contextual encoder. 4130 Knowledge Module Given a commonsense KG (entities and relations) of a path are represented by G = (E; R), where E is the entity set and R is the their feature vectors. relation set, we seek a set of relevant knowledge Reasoning Module This module leverages the un- facts for a question-choice pair rq; ax, which would structured evidence (the context embedding c) and serve as structured evidence to support reasoning. the structured one (the knowledge embedding k), We employ an entity recognition system to extract to compute the plausibility of a question-choice relevant entity mentions in the question (denoted by pair. We concatenate c with k and feed them to q q a a E = re x) and one of the choices (E = re x). We the final classification layer, which is a linear trans- connect each pair of question-choice entities with a formation that scores a question-choice pair rq; ax: multi-hop path, which can be done either by retrieving existing paths for now (as in previous methods) f(q; a) = Wcls ⋅ [c; k] + bcls; (6) or by generating paths (see §3.3). Formally, a path q a q a is p(e ; e ) = re ; r0; e1; r1; :::; rT −1; e x where The linear classification layer is parameterized by T is the number of hops.

Connecting the Dots: a Knowledgeable Path Generator for Commonsense Question Answering

Commonsense Reasoning for Natural Language Processing

Review Articles

Downloading Programs

Commonsense Knowledge in Wikidata

Natural Language Understanding with Commonsense Reasoning

Evaluating Commonsense Reasoning in Neural Machine Translation

An Approach to Evaluate AI Commonsense Reasoning Systems∗

Commonsense Reasoning About Task Instructions

Using Conceptnet to Teach Common Sense to an Automated Theorem Prover∗

Machine Common Sense Concept Paper David Gunning DARPA/I2O October 14, 2018

Knowledge Representation and Commonsense Reasoning: Reviews of Four Books

An Architecture of Diversity for Commonsense Reasoning