Multi-Paragraph Reasoning with Knowledge-Enhanced Graph Neural Network
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network Deming Ye1, Yankai Lin2, Zhenghao Liu1, Zhiyuan Liu1, Maosong Sun1 1Department of Computer Science and Technology, Tsinghua University, Beijing, China Institute for Artificial Intelligence, Tsinghua University, Beijing, China State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China 2Pattern Recognition Center, WeChat AI, Tencent Inc. [email protected] Abstract [1] The 2015 MTV Video Music Awards were held on August 30, 2015. … Swift’s “Wildest Dreams” music Multi-paragraph reasoning is indispensable for video premiered during the pre-show. … open-domain question answering (OpenQA), [2] “Wildest Dreams” is a song recorded by American which receives less attention in the current singer-songwriter Taylor Swift for her fifth studio OpenQA systems. In this work, we propose album, “1989”. The song was released to radio by Big a knowledge-enhanced graph neural network Machine Records on August 31, 2015, as the album's (KGNN), which performs reasoning over mul- fifth single. Swift co-wrote the song with its producers tiple paragraphs with entities. To explicitly Max Martin and Shellback. … capture the entities’ relatedness, KGNN uti- Swift [1] Wildest Dreams [1] Wildest lizes relational facts in knowledge graph to Dreams [2] build the entity graph. The experimental re- sults show that KGNN outperforms in both distractor and full wiki settings than baselines Big Machine Records [2] methods on HotpotQA dataset. And our fur- Taylor Swift [2] Max Martin [2] Shellback [2] ther analysis illustrates KGNN is effective and Question: Who co-wrote Taylor Swift's song that had robust with more retrieved paragraphs. its music video premiere during the pre-show of the 2015 MTV Video Music Awards? 1 Introduction Answer: Max Martin and Shellback Open-domain question answering (OpenQA) aims to answer questions based on large-scale knowl- Figure 1: An example of multi-hop reasoning. KGNN extracts entity from each paragraph, and propagates edge source, such as an unlabelled corpus. Recent message through double-headed arrow equal to, solid years, OpenQA has aroused the interest of many lyrics by, and dotted record label edges. Edges help- researchers, with the availability of large-scale ing us reasoning for the question are picked out in red. datasets such as Quasar (Dhingra et al., 2017), SearchQA (Dunn et al., 2017), TriviaQA (Joshi when dealing with multiple paragraphs: (1) re- et al., 2017), etc. They proposed lots of OpenQA garding each paragraph as an individual which models (Chen et al., 2017; Clark and Gardner, cannot reason over paragraphs; (2) concatenating 2018; Wang et al., 2018a,b; Choi et al., 2017; Lin all paragraphs into a single long text which leads et al., 2018) which achieved promising results in to time and memory consuming. arXiv:1911.02170v1 [cs.CL] 6 Nov 2019 various public benchmarks. To achieve a multi-paragraph reasoning system, However, most questions in previous OpenQA we propose a knowledge-enhanced graph neural datasets only require reasoning within a single network (KGNN). First, we build an entity graph paragraph or a single-hop over paragraphs. The by all named entities from paragraphs, and add co- HotpotQA dataset (Yang et al., 2018) was con- reference edges to the graph if the entity appears in structed to facilitate the development of OpenQA different paragraphs. After that, to explicitly cap- system in handling multi-paragraph reasoning. ture the entities’ relatedness, we further utilize the Multi-paragraph reasoning is an important and relational facts in knowledge graph (KG) to build practical problem towards a more intelligent the relational entity graph for reasoning, i.e, add OpenQA. Nevertheless, existing OpenQA systems a relation edge to the graph if two entities have have not paid enough attention to multi-paragraph a relation in KG. We believe that the reasoning reasoning. They generally fall into two categories information can be captured through propagation Query Context 1 Context 2 Context N dimensional representations. The question Q and … Q P1 P2 PN paragraphs fP1;P2; ··· ;PN g are first encoded as: Encoding Module Q = Self-Att(Char-Enc(Q)); (1) Pi = Self-Att(Char-Enc(Pi)); (2) (0) (0) (0) P1 P2 … PN then we compute the question-related paragraph representations through a bi-attention operation: T-step Taylor Swift[1] Wildest Dreams[2] (t) (t) (0) P¯i = Bi-Att(Q; Pi); (3) Wildest Dreams[1] … where Char-Enc(·) denotes the “character- level encoder”, Self-Att(·) denotes the “self- attention layer” and Bi-Att(·) denotes the “bi- attention layer”. (T) (T) … (T) P1 P2 PN 2.2 Reasoning module Prediction Module Most existing OpenQA models (Chen et al., 2017; Clark and Gardner, 2018) directly regards all para- graphs as individuals, which may have obstacles Yes/No/Start End in multi-paragraph reasoning for answering ques- tions. Different from existing OpenQA models, Figure 2: Overview of KGNN. KGNN propagates information among paragraphs over a relational entity graph. As the example in with entities through a knowledge-enhanced graph Figure1, for the given entity Wildest Dreams, we neural network. require two kinds of the one-hop reasoning to ob- 2.2.1 Build entity graph tain the answer: One is Wildest Dreams appears in 1 multi-paragraph and the other is reasoning based We regard all entities from paragraphs as nodes in on the relational fact (Wildest Dreams,lyrics by, the graph for multi-paragraph reasoning, which is Max martin and Shellback). denoted by V . And we build the entity graph ac- Our main contribution is that we propose a cording to two types of edges among paragraphs. novel reasoning module combined with knowl- Co-reference of entity is the most common rela- edge. The experiments show that reasoning over tion across paragraphs. For two nodes vi; vj 2 V , entities can help our model surpass all baseline an edge e^ij = (vi; vj) will be added to the graph if models significantly on HotpotQA dataset. Our two nodes indicate the same entity; Furthermore, analysis demonstrates that KGNN is robust and we adopt relational facts from knowledge graph has a strong ability to handle massive texts. to enhance our model’s reasoning ability. For two r nodes vi; vj 2 V , a relational edge e¯ij = (vi; vj) 2 Model architecture will be added to the graph if two entities have a re- lation r, which could help the relational reasoning. In this section, we introduce the framework of knowledge-enhanced graph neural network 2.2.2 Relational Reasoning (KGNN) for multi-paragraph reasoning. As we have built the graph G = (V ; E), we lever- As shown in Figure2, KGNN consists of three age KGNN to perform reasoning. The reasoning parts including encoding module, reasoning mod- process is as follows: ule, and prediction module. An additional compo- Graph Representation. For each node vi 2 V , nent is used for supporting fact prediction. we obtain its initial representation from contextual word representations. Assuming the correspond- 2.1 Encoder module ing entity has k mentions in the paragraph, the ini- Without loss of generality, we use the encod- tial node representation vi would be defined as: ing components described in Clark and Gardner (2018), which include a character-level encoder, vi = Max-Pool(m1; m2; ::mk); (4) self-attention layer, and bi-attention layer to em- 1We link the named entities to Wikidata with spaCy bed the question and paragraphs into their low- (https://spacy.io/) where Max-Pool(·) denotes a max-pooling layer paragraph, and the output of reasoning part will be and mi denotes the representation of the mention added to the input as a residual: mi. Here, if a mention mi ranges from s-th to ¯ 0 ¯ ^ Pij = Pij + Self-Att(Uij): (9) t-th word in the paragraph Pj, its representation is defined as the mean of word representations as We denote the initial paragraph representations 1 Pt ¯ as P¯ (0), and denote the entire one-step reasoning mi = t−s+1 l=s FFN(Pjl), where FFN(·) indi- cates a fully-connect feed-forward layer. process, i.e., Eq. (4-9) as a single function: Message Propagation. As we want to rea- P¯ (t) = Reason(P¯)(t−1); (10) son over paragraphs with entities, we propagate messages from each node to its neighbors to where t ≥ 1. Hence, a T -step reasoning can be help perform reasoning. Since different kinds of divided into T times one-step reasoning. edges play different roles in reasoning, we use the 2.3 Prediction module relation-specific network in the message propaga- After T -step reasoning on the relational entity tion. Formally, we define the following propaga- graph, we predict the answer according to the final tion function for calculating the update of a node: question-aware paragraph representations P¯ (T ). X αr X Furthermore, for the sake of the answer proba- vu = φ (v ); (5) i jN (v )j r j bility comparison among multiple paragraphs, we r r i vj 2Nr(vi) utilize shared-normalization (Clark and Gardner, where Nr(vi) is the neighbor set for vi with re- 2018) in the answer prediction. lation r. αr and φr(·) are relation-specific at- tention weight and feed-forward network respec- 3 Experiment tively, which is defined as follows. 3.1 Dataset To measure the relevance between question and We use HotpotQA to conduct our experiments. relation, we utilize a entire question representation HotpotQA is an OpenQA dataset with com- Q¯ to compute relation-specific attention weight plex reasoning, which contains more than 11k r α = softmax(FFN(Q¯ ) for relation as r . And question-answer pairs and all questions in devel- E with a translating relation embedding r, we de- opment and testing set require complex multi-hop φ (v ) = sign our relation-specific network as r j reasoning over paragraphs.