Arxiv:2104.07302V1 [Cs.CL] 15 Apr 2021 Question Answering (QA) Plays a Central Role in Artiﬁcial Intelligence

TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph Jiaxin Shi Shulin Cao Lei Hou Tsinghua University Tsinghua University Tsinghua University [email protected] [email protected] [email protected] Juanzi Li Hanwang Zhang Tsinghua University Nanyang Technological University [email protected] [email protected] Abstract Question: What organization did the wife of Bill Gates found? Multi-hop Question Answering (QA) is a chal- lenging task because it requires precise rea- Relation Graph Label Form Text Form soning with entity relations at every step to- Melinda Melinda Gates Gates wards the answer. The relations can be rep- <sub> and <obj> have In 2000, <sub> co- Spouse been married for 26 founded the <obj> with resented in terms of labels in knowledge graph Founder years. her husband Bill Gates. Bill Gates Bill Gates (e.g., spouse) or text in text corpus (e.g., they In 2000, Melinda Gates Founder co-founded the <obj> with Bill & Melinda Bill & Melinda her husband <sub>. have been married for 26 years). Existing Gates During his career at <sub>, Gates <obj> held the positions of Foundation CEO Foundation chairman, chief executive officer (CEO), president and models usually infer the answer by predict- chief software architect. Microsoft Microsoft ing the sequential relation path or aggregat- Corporation Corporation ing the hidden graph features. The former is hard to optimize, and the latter lacks inter- Answer: Bill & Melinda Gates Foundation pretability. In this paper, we propose Trans- ferNet, an effective and transparent model for Figure 1: Answering a multi-hop question over the multi-hop QA, which supports both label and relation graph. The relations are constrained predi- text relations in a unified framework. Trans- cates in the label form (i.e., knowledge graph) while ferNet jumps across entities at multiple steps. free texts in the text form. The reasoning process has At each step, it attends to different parts of been marked in the graph, where the correspondence the question, computes activated scores for between relations and question words has been high- relations, and then transfer the previous en- lighted in the same color. tity scores along activated relations in a differentiable way. We carry out extensive experiments on three datasets and demonstrate et al., 2020), e.g., Who is the CEO of Microsoft Cor- that TransferNet surpasses the state-of-the-art poration. However, multi-hop QA, which requires models by a large margin. In particular, on reasoning with the entity relations at multiple steps, MetaQA, it achieves 100% accuracy in 2-hop is far from resolved (Yang et al., 2018; Dua et al., and 3-hop questions. By qualitative analysis, 2019; Zhang et al., 2017; Talmor and Berant, 2018). we show that TransferNet has transparent and interpretable intermediate results. In this paper, we focus on multi-hop QA based on relation graphs, which consists of entities and 1 Introduction their relations. As shown in Figure1, the relations can be represented by two forms: arXiv:2104.07302v1 [cs.CL] 15 Apr 2021 Question answering (QA) plays a central role in artificial intelligence. It requires machines to un- • Label form, also known as knowledge graph derstand the free-form questions and infer the an- (e.g., Freebase (Bollacker et al., 2008), Wiki- swers by analyzing information from a large cor- data (Vrandeciˇ c´ and Krötzsch, 2014)), whose pus (Rajpurkar et al., 2016; Joshi et al., 2017; Chen relations are manually-defined constrained et al., 2017) or structured knowledge base (Bordes predicates (e.g., Spouse, CEO). et al., 2015; Yih et al., 2015; Jiang et al., 2019). Along with the fast development of deep learn- • Text form, whose relations are free texts re- ing, especially the pretraining technology (Devlin trieved from textual corpus. We can easily et al., 2018; Lan et al., 2019), state-of-the-art mod- build the graph by extracting the co-occuring els have been shown comparative with human per- sentences of two entities. Since the label form formance on simple questions that only need a sin- is expensive and usually incomplete, the text gle hop (Petrochuk and Zettlemoyer, 2018; Zhang form is more economical and practical. In this paper, we aim to tackle multi-hop questions 2016) and CompWebQ (Talmor and Berant, 2018). over these two different forms in a unified frame- TransferNet achieves 100% accuracy in the 2-hop work. and 3-hop questions of MetaQA. On WebQSP and Existing methods for multi-hop QA have two CompWebQ, we also achieve a significant improve- main strands. The first is to predict the sequential ment over state-of-the-art models. For the text relation path in a weakly supervised setting (Zhang form, following (Sun et al., 2019), we construct the et al., 2017; Qiu et al., 2020), that is, to learn the relation graph of MetaQA from the WikiMovies intermediate path only based on the final answer. corpus (Miller et al., 2016). We demonstrate that These works suffer from the convergence issues TransferNet surpasses previous models by a large due to the huge search space, which heavily hinders margin, especially for the 2-hop and 3-hop ques- their performance. Besides, they are mostly pro- tions. When we mix the label form and the text posed for the label form. So, it is not clear how to form, TransferNet still keeps its superiority. More- adapt them to the text form, whose search space is over, by visualizing the intermediate results, we 1 even much huger. The second strand is to collect ev- show its strong interpretability. idences by using graph neural networks (Sun et al., 2018, 2019). They can handle both the two relation 2 Related Work forms and achieve state-of-the-art performance. Al- In this paper we focus on multi-hop question an- though they prevail over the path-based models in swering over the graph structure that is either performance, they are weak in interpretability since knowledge graph or built from text corpus. In their intermediate reasoning process is black-box previous works, GraftNet (Sun et al., 2018) and neural network layers. PullNet (Sun et al., 2019) have a similar setting to In this paper, we propose a novel model for ours but they mostly aim at the mixed form, which multi-hop QA, dubbed TransferNet, which has includes both label relations and text relations. the following advantages: 1) Generality. It can They first retrieve a question-specific subgraph and deal with the label form, the text form, and their then use graph convolutional networks (Kipf and combinations in a unified framework. 2) Effective- Welling, 2016) to implicitly infer the answer en- ness. TransferNet outperforms previous models tity. These GCN-based methods can achieve good significantly, achieving 100% accuracy of 2-hop performance but are usually weak in interpretabil- and 3-hop questions in MetaQA dataset. 3) Trans- ity because they cannot produce the intermediate parency. TransferNet is fully attention-based, so reasoning path, which is necessary in our opinion its intermediate steps can be easily visualized and for the task of multi-hop question answering. Be- understood by humans. sides, there are many works specifically for only Specifically, TransferNet infers the answer by one graph form: transfering entity scores along relation scores of For the label form, which is also known as multiple steps. It starts from the topic entity of “KBQA” or “KGQA”, existing methods fall into the question and maintains an entity score vector, two categories: information retrieval (Miller et al., whose elements indicate the probability of an en- 2016; Xu et al., 2019; Zhao et al., 2019b; Saxena tity being activated. At each step, it attends to et al., 2020) and semantic parsing (Berant et al., some question words (e.g., the wife of ) and com- 2013; Yih et al., 2015; Liang et al., 2017; Guo pute scores for the relations in the graph. Relations et al., 2018; Saha et al., 2019). The former re- relevant to the question words will have high scores trieves answer from KG by learning representa- (e.g., Spouse). We formulate these relation scores tions of question and graph, while the latter queries into an adjacent matrix, where each entry indicates answer by parsing the question into logical form. the transfer probability of an entity pair. By mul- Among these methods, VRN (Zhang et al., 2017) tiplying the entity score vector with the relation and SRN (Qiu et al., 2020) have a good inter- score matrix, we can “hop” along relations in a pretability as they learn an explicit reasoning path differentiable manner. After repeating for multiple with reinforcement learning. However, they suffer steps, we can finally arrive at the target entity. from the convergency issue due to the huge search space. IRN (Zhou et al., 2018) and ReifKB (Cohen We conduct experiments for the two forms et al., 2020) learn a soft distribution for intermedi- respectively. For the label form, we use MetaQA (Zhang et al., 2017), WebQSP (Yih et al., 1https://github.com/shijx12/TransferNet ate relations and can be optimized using only the of the question to determine the most proper rela- final answer. However, it is not clear how to extend tion. TransferNet maintains a score for each entity them to the text form. to denote their activated probabilities, which are Question answering over text corpus is also initialized to 1 for the topic entity and 0 for the known as “reading comprehension”. For simple others. At each step, TransferNet computes a score questions, whose answer can be retrieved directly for each relation to denote their activated probabili- from the text, pretrained models (Devlin et al., ties in terms of the current query, and then transfer 2018; Lan et al., 2019) have performed better than the entity scores across those activated relations.

Load more