A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering Ibrahim Abdelaziz∗, Srinivas Ravishankar∗, Pavan Kapanipathi∗, Salim Roukos, Alexander Gray IBM Research IBM T.J. Watson Research Center Yorktown Heights, NY, USA fibrahim.abdelaziz1, srini, [email protected], fkapanipa, [email protected] Abstract associated PropBank information to a KB such as DBpedia. Finally, the logic forms are used by Logical Neural Network Knowledge Base Question Answering (KBQA) is a task where existing techniques have faced significant challenges, such as (LNN) (Riegel et al. 2020), a neuro-symbolic reasoner that the need for complex question understanding, reasoning, and enables DTQA to perform different types of reasoning. DTQA large training datasets. In this work, we demonstrate Deep comprises several modules which operate within a pipeline Thinking Question Answering (DTQA), a semantic parsing and are individually trained for their specific tasks. and reasoning-based KBQA system. DTQA (1) integrates mul- This demonstration shows an interpretable methodology tiple, reusable modules that are trained specifically for their for KBQA that achieves state-of-the-art results on two promi- individual tasks (e.g. semantic parsing, entity linking, and rela- nent datasets (QALD-9, LC-QuAD-1). DTQA is a system of tionship linking), eliminating the need for end-to-end KBQA systems that demonstrates: (1) the use of task-general seman- training data; (2) leverages semantic parsing and a reasoner for tic parsing via AMR; (2) an approach that aligns AMR to KB improved question understanding. DTQA is a system of sys- triples; and (3) the use of a neuro-symbolic reasoner for the tems that achieves state-of-the-art performance on two popular KBQA datasets. KBQA task. DTQA System Introduction Figure 1 shows an overview of DTQA with an example ques- The goal of Knowledge Base Question Answering is to an- tion. The input to DTQA is the question text, which is first swer natural language questions over Knowledge Bases (KB) parsed using the Abstract Meaning Representation (AMR) such as DBpedia (Auer et al. 2007) and Wikidata (Vrandeciˇ c´ parser (Banarescu et al. 2013; Dorr, Habash, and Traum and Krotzsch¨ 2014). Existing approaches to KBQA are ei- 1998). AMR is a semantic parse representation that solves ther (i) graph-driven (Diefenbach et al. 2020; Vakulenko the ambiguity of natural language by representing syntacti- et al. 2019), or (ii) end-to-end learning (Sun et al. 2020) cally different sentences with the same underlying meaning approaches. Graph-driven approaches attempt to find a best- in the same way. An AMR parse of a sentence is a rooted, match KB sub-graph for the given question, whereas end-to- directed, acyclic graph expressing “who is doing what to end learning approaches, requiring large amounts of training whom”. We use the current state-of-the-art system for AMR data, directly predict a query sequence (i.e. SPARQL or SQL) that leverage transition-based (Naseem et al. 2019) parsing from the input question. Both categories still suffer from a approach, parameterized with neural networks and trained number of challenges, including complex question under- in an end-to-end fashion. AMR parses are KB independent, standing, scarcity of end-to-end training data, and the need however an essential task to solve KBQA using AMR is to for reasoning. align the AMR parses with the knowledge base. Therefore, in In this work, we demonstrate DTQA; a semantic-parsing the next three modules we align the AMR parse to a DBpedia and reasoning-based KBQA system. DTQA begins by parsing sub-graph that can be transformed into a SPARQL query. the natural language question into Abstract Meaning Repre- First, DTQA uses an Entity Extraction and Linking (EL) sentation (AMR), which is a symbolic formalism that cap- module that extracts entities and types and links them to their tures rich semantic information from natural language. The corresponding KB URIs. Given a question such as Which ac- use of AMR enables task-independent language understand- tors starred in Spanish movies produced by Benicio del Toro?, ing, which DTQA leverages to answer structurally complex the EL module will link the AMR nodes of Benicio del Toro, questions. Next, our system uses a triples-based approach and Spanish to dbr:Benicio del Toro and dbr:Spain, respec- that builds on state-of-the-art entity and relation linking mod- tively. In DTQA, we trained a BERT-based neural mention ules to align the AMR graph with the underlying KB and detection and used BLINK (Devlin et al. 2018) for disam- produce logic forms. This step accurately maps AMR and its biguation of named entities. ∗Equal contribution Aligning the AMR graph to the underlying KB is a chal- Copyright c 2021, Association for the Advancement of Artificial lenge, particularly due to structural and granularity mismatch Intelligence (www.aaai.org). All rights reserved. between AMR and KBs such as DBPedia. The PropBank 15985 !"#$%&'()*$+,'%- ."$,/%'%- 4+56,$ !"#$,3$ /31-8&2$9)6'&2$ !"#$%&'()' *+,-,.$/-+0-+1 !"#$,3$=>$?'-52)( #)2&,-3+$/-+0-+1 76)(,-3+ /31-8 9),:3'0($ ;/99< #,$ "-./'0' &()*+' #1"%&' !"#$%&' +,-"()* arg ! (type(x, dbo:Film) ^ country(x, dbr:Spain) ^ -"2) -"2* 0+:+*;+ ,*-.) z -"23 !"#$*+,&&-./ producer(x, dbr:Benicio_del_Toro) ^ ()$!*+ ,*-.) starring(x, z) ^ type(z, Person) ) .#$ -"2* 0+:+*;+ .#$ .#$/&#%0,"1 ($*/01)%&' 1*0+"$8 9(#.+ !"#$(#'.+&0 -"2) 0-.' "-./'1' !"#$%&' 29(#.+7 +,-"()* SELECT DISTINCT ?uri WHERE { ()$!*+ 0+:+*;+ ,*-.) Which actors 0-.' !"#$*+,&&-./ ?film rdf:type dbo:Film ; dbo:country dbr:Spain ; starred in Spanish 23)+.1.* !"#$./0(' ()* dbo:producer dbr:Benicio_del_Toro ; movies produced /)456*$*7 dbo:starring ?uri . !"#$%&'%&#!'()& by Benicio del 3)+.1.* $ } Toro? !"#$%&'()(*+!&,+-*#* /)456*$* !"# Figure 1: DTQA architecture, illustrated via examples frames in AMR are n-ary predicates whereas relationships mapping of an AMR predicate to its semantically equivalent in KBs are predominantly binary. In order to transform the relation from the underlying KB using a distant supervision n-ary predicates to binary, we use a novel ‘Path-based Triple dataset. Apart from this statistical co-occurrence informa- Generation’ module that transforms the AMR graph (with tion, the rich lexical information in the AMR predicate and linked entities) into a set of triples where each has one to question text is also leveraged by SLING. This is captured one correspondence with constraints in the SPARQL query. by a neural-model and an unsupervised semantic similarity The main idea of this module is to find paths between the module that measures similarity of each AMR predicate with question amr-unknown to every linked entity and hence avoid a list of possible relations using GloVe (Pennington, Socher, generating unnecessary triples. For the example in Figure 1, and Manning 2014) embeddings. this approach will generate the following two paths: The AMR to Logic module is a rule-based system that 1. amr-unknown ! star-01 ! movie ! country ! Spain transforms the KB-aligned AMR paths to a logical formalism 2. amr-unknown ! star-01 ! movie ! produce-01 ! Beni- to support binary predicates and higher-order functional pred- cio del Toro. icates (e.g. for aggregation and manipulation of sets). This logic formalism has a deterministic mapping to SPARQL by Note that the predicate act-01 in the AMR graph is irrel- introducing SPARQL constructs such as COUNT, ORDER evant, since it is not on the path between amr-unknown and BY, and FILTER. the entities, and hence ignored. Some questions in KBQA datasets require reasoning. How- Furthermore, AMR parse provides relationship informa- ever, existing KBQA systems (Diefenbach et al. 2020; Vaku- tion that is generally more fine-grained than the KB. To lenko et al. 2019; Sun et al. 2020) do not have a modular address this granularity mismatch, the path-based triple gen- reasoner to perform logical reasoning using the information eration collapses multiple predicates occurring consecutively available in the knowledge base. In DTQA, the final mod- into a single edge. For example, in the question “In which ule is a logical reasoner called Logical Neural Networks ancient empire could you pay with cocoa beans? “, the path (LNN) (Riegel et al. 2020). LNN is a neural network architec- consists of three predicates between amr-unknown and cocoa ture in which neurons model a rigorously defined notion of bean; (location, pay-01, instrument). These are collapsed into weighted fuzzy or classical first-order logic. LNN takes as in- a single edge, and the result is amr-unknown ! location j put the logic forms from AMR to Logic and has access to the pay-01j instrument ! cocoa-bean. KB to provide an answer. LNN retrieves predicate groundings Next, the (collapsed) AMR predicates from the triples are via its granular SPARQL integration and performs type-based, mapped to relationships in the KB using the Relation Extrac- and geographic reasoning based on the ontological information and Linking (REL) module. As seen in Figure 1, the tion available in the knowledge base. LNN allowed us to REL module maps star-01 and produce-01 to DBpedia rela- manually add generic axioms for the geographic reasoning tions dbo:starring and dbo:producer, respectively. For this scenarios. task, DTQA uses SLING (Mihindukulasooriya et al. 2020); the state-of-the-art REL system for KBQA. SLING takes as input the AMR graph with linked entities and extracted paths Experimental Evaluation obtained from the upstream modules. It relies on an ensem- To evaluate DTQA, we use two prominent KBQA datasets; ble of complementary relation linking approaches to produce QALD - 9 (Ngomo 2018) and LC-QuAD 1.0 (Trivedi et al. a top-k list of KB relations for each AMR predicate. The 2017). The latest version of QALD-9 contains 150 test ques- most relevant and novel approach in SLING is its statistical tions whereas LC-QuAD test set contains 1,000 questions.

A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering

Abstraction and Generalisation in Semantic Role Labels: Propbank

A Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling

Handling Ambiguity Problems of Natural Language Interface For

The Question Answering System Using NLP and AI

Quester: a Speech-Based Question Answering Support System for Oral Presentations Reza Asadi, Ha Trinh, Harriet J

Generating Lexical Representations of Frames Using Lexical Substitution

Ontology-Based Approach to Semantically Enhanced Question Answering for Closed Domain: a Review

A Question-Driven News Chatbot

Natural Language Processing with Deep Learning Lecture Notes: Part Vii Question Answering 2 General QA Tasks

Distributional Semantics, Wordnet, Semantic Role

Structured Retrieval for Question Answering

Foundations of Natural Language Processing Lecture 16 Semantic