Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

Deming Ye1, Yankai Lin2, Zhenghao Liu1, Zhiyuan Liu1, Maosong Sun1 1Department of Computer Science and Technology, Tsinghua University, Beijing, China Institute for Artificial Intelligence, Tsinghua University, Beijing, China State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China 2Pattern Recognition Center, WeChat AI, Tencent Inc. [email protected]

Abstract [1] The 2015 MTV Video Music Awards were held on August 30, 2015. … Swift’s “Wildest Dreams” music Multi-paragraph reasoning is indispensable for video premiered during the pre-show. … open-domain question answering (OpenQA), [2] “Wildest Dreams” is a song recorded by American which receives less attention in the current singer-songwriter Taylor Swift for her fifth studio OpenQA systems. In this work, we propose album, “1989”. The song was released to radio by Big a knowledge-enhanced graph neural network Machine Records on August 31, 2015, as the album's (KGNN), which performs reasoning over mul- fifth single. Swift co-wrote the song with its producers tiple paragraphs with entities. To explicitly Max Martin and Shellback. … capture the entities’ relatedness, KGNN uti- Swift [1] Wildest Dreams [1] Wildest lizes relational facts in knowledge graph to Dreams [2] build the entity graph. The experimental re- sults show that KGNN outperforms in both distractor and full wiki settings than baselines Big Machine Records [2] methods on HotpotQA dataset. And our fur- Taylor Swift [2] Max Martin [2] Shellback [2] ther analysis illustrates KGNN is effective and Question: Who co-wrote Taylor Swift's song that had robust with more retrieved paragraphs. its music video premiere during the pre-show of the 2015 MTV Video Music Awards? 1 Introduction Answer: Max Martin and Shellback Open-domain question answering (OpenQA) aims to answer questions based on large-scale knowl- Figure 1: An example of multi-hop reasoning. KGNN extracts entity from each paragraph, and propagates edge source, such as an unlabelled corpus. Recent message through double-headed arrow equal to, solid years, OpenQA has aroused the interest of many lyrics by, and dotted record label edges. Edges help- researchers, with the availability of large-scale ing us reasoning for the question are picked out in red. datasets such as Quasar (Dhingra et al., 2017), SearchQA (Dunn et al., 2017), TriviaQA (Joshi when dealing with multiple paragraphs: (1) re- et al., 2017), etc. They proposed lots of OpenQA garding each paragraph as an individual which models (Chen et al., 2017; Clark and Gardner, cannot reason over paragraphs; (2) concatenating 2018; Wang et al., 2018a,b; Choi et al., 2017; Lin all paragraphs into a single long text which leads et al., 2018) which achieved promising results in to time and memory consuming. arXiv:1911.02170v1 [cs.CL] 6 Nov 2019 various public benchmarks. To achieve a multi-paragraph reasoning system, However, most questions in previous OpenQA we propose a knowledge-enhanced graph neural datasets only require reasoning within a single network (KGNN). First, we build an entity graph paragraph or a single-hop over paragraphs. The by all named entities from paragraphs, and add co- HotpotQA dataset (Yang et al., 2018) was con- reference edges to the graph if the entity appears in structed to facilitate the development of OpenQA different paragraphs. After that, to explicitly cap- system in handling multi-paragraph reasoning. ture the entities’ relatedness, we further utilize the Multi-paragraph reasoning is an important and relational facts in knowledge graph (KG) to build practical problem towards a more intelligent the relational entity graph for reasoning, i.e, add OpenQA. Nevertheless, existing OpenQA systems a relation edge to the graph if two entities have have not paid enough attention to multi-paragraph a relation in KG. We believe that the reasoning reasoning. They generally fall into two categories information can be captured through propagation Query Context 1 Context 2 Context N dimensional representations. The question Q and … Q P1 P2 PN paragraphs {P1,P2, ··· ,PN } are first encoded as:

Encoding Module Q = Self-Att(Char-Enc(Q)), (1) Pi = Self-Att(Char-Enc(Pi)), (2) (0) (0) (0) P1 P2 … PN then we compute the question-related paragraph representations through a bi-attention operation: T-step Taylor Swift[1] Wildest Dreams[2] (t) (t) (0) P¯i = Bi-Att(Q, Pi), (3) Wildest Dreams[1]

… where Char-Enc(·) denotes the “character- level encoder”, Self-Att(·) denotes the “self- attention layer” and Bi-Att(·) denotes the “bi- attention layer”. (T) (T) … (T) P1 P2 PN 2.2 Reasoning module Prediction Module Most existing OpenQA models (Chen et al., 2017; Clark and Gardner, 2018) directly regards all para- graphs as individuals, which may have obstacles Yes/No/Start End in multi-paragraph reasoning for answering ques- tions. Different from existing OpenQA models, Figure 2: Overview of KGNN. KGNN propagates information among paragraphs over a relational entity graph. As the example in with entities through a knowledge-enhanced graph Figure1, for the given entity Wildest Dreams, we neural network. require two kinds of the one-hop reasoning to ob- 2.2.1 Build entity graph tain the answer: One is Wildest Dreams appears in 1 multi-paragraph and the other is reasoning based We regard all entities from paragraphs as nodes in on the relational fact (Wildest Dreams,lyrics by, the graph for multi-paragraph reasoning, which is Max martin and Shellback). denoted by V . And we build the entity graph ac- Our main contribution is that we propose a cording to two types of edges among paragraphs. novel reasoning module combined with knowl- Co-reference of entity is the most common rela- edge. The experiments show that reasoning over tion across paragraphs. For two nodes vi, vj ∈ V , entities can help our model surpass all baseline an edge eˆij = (vi, vj) will be added to the graph if models significantly on HotpotQA dataset. Our two nodes indicate the same entity; Furthermore, analysis demonstrates that KGNN is robust and we adopt relational facts from knowledge graph has a strong ability to handle massive texts. to enhance our model’s reasoning ability. For two r nodes vi, vj ∈ V , a relational edge e¯ij = (vi, vj) 2 Model architecture will be added to the graph if two entities have a re- lation r, which could help the relational reasoning. In this section, we introduce the framework of knowledge-enhanced graph neural network 2.2.2 Relational Reasoning (KGNN) for multi-paragraph reasoning. As we have built the graph G = (V , E), we lever- As shown in Figure2, KGNN consists of three age KGNN to perform reasoning. The reasoning parts including encoding module, reasoning mod- process is as follows: ule, and prediction module. An additional compo- Graph Representation. For each node vi ∈ V , nent is used for supporting fact prediction. we obtain its initial representation from contextual word representations. Assuming the correspond- 2.1 Encoder module ing entity has k mentions in the paragraph, the ini- Without loss of generality, we use the encod- tial node representation vi would be defined as: ing components described in Clark and Gardner (2018), which include a character-level encoder, vi = Max-Pool(m1, m2, ..mk), (4) self-attention layer, and bi-attention layer to em- 1We link the named entities to Wikidata with spaCy bed the question and paragraphs into their low- (https://spacy.io/) where Max-Pool(·) denotes a max-pooling layer paragraph, and the output of reasoning part will be and mi denotes the representation of the mention added to the input as a residual: mi. Here, if a mention mi ranges from s-th to ¯ 0 ¯ ˆ Pij = Pij + Self-Att(Uij). (9) t-th word in the paragraph Pj, its representation is defined as the mean of word representations as We denote the initial paragraph representations 1 Pt ¯ as P¯ (0), and denote the entire one-step reasoning mi = t−s+1 l=s FFN(Pjl), where FFN(·) indi- cates a fully-connect feed-forward layer. process, i.e., Eq. (4-9) as a single function: Message Propagation. As we want to rea- P¯ (t) = Reason(P¯)(t−1), (10) son over paragraphs with entities, we propagate messages from each node to its neighbors to where t ≥ 1. Hence, a T -step reasoning can be help perform reasoning. Since different kinds of divided into T times one-step reasoning. edges play different roles in reasoning, we use the 2.3 Prediction module relation-specific network in the message propaga- After T -step reasoning on the relational entity tion. Formally, we define the following propaga- graph, we predict the answer according to the final tion function for calculating the update of a node: question-aware paragraph representations P¯ (T ). X αr X Furthermore, for the sake of the answer proba- vu = φ (v ), (5) i |N (v )| r j bility comparison among multiple paragraphs, we r r i vj ∈Nr(vi) utilize shared-normalization (Clark and Gardner, where Nr(vi) is the neighbor set for vi with re- 2018) in the answer prediction. lation r. αr and φr(·) are relation-specific at- tention weight and feed-forward network respec- 3 Experiment tively, which is defined as follows. 3.1 Dataset To measure the relevance between question and We use HotpotQA to conduct our experiments. relation, we utilize a entire question representation HotpotQA is an OpenQA dataset with com- Q¯ to compute relation-specific attention weight plex reasoning, which contains more than 11k r α = softmax(FFN(Q¯ ) for relation as r . And question-answer pairs and all questions in devel- E with a translating relation embedding r, we de- opment and testing set require complex multi-hop φ (v ) = sign our relation-specific network as r j reasoning over paragraphs. FFN(v + E ) j r . We evaluate KGNN on distractor and full wiki Paragraph Update. After message propaga- settings of HotpotQA. In distractor setting, 2 tion over entities among paragraphs, we update golden paragraphs are given as inputs with 8 dis- the question-aware paragraph representations. We turbance terms. In full wiki setting, only questions first define question-aware paragraph representa- are offered and we need to extract the answer from tions Uij is defined as: the whole corpus. So we employ an information 30 ( u retrieval system to retrieve paragraphs from the vIdx(ij) an entity appears at Pij, Uij = (6) entire Wikipedia for the following experiment. 0 otherwise, For the relational entity graph construction, we align entities to Wikidata items and we add 5 com- where Idx(·) indicates the node index of Pij in the mon relations in question answering to propagate constructed entity graph. message. The relations include {director, posi- Further, we utilize a reset gate to decide how tion held, record label, lyrics by, adapted from }. much information to keep from the graph-aware paragraph representation: 3.2 Baseline Methods To verify the effectiveness of our reasoning mod- p ¯ u rij = σ(W Pij + W Uij), (7) ule, we compare KGNN with Yang et al.(2018) 2

Uˆ ij = rij ∗ Uij + (1 − rij) ∗ P¯ ij, (8) and a modified version, named Yang et al.(2018)- split, which regards all paragraphs individually where Wp and Wu are trainable matrices and σ and applies a shared-normalization function over denotes the sigmoid function. paragraphs to obtain the answer. Finally, we apply self-attention mechanism to 2We use Adam optimizer to train the official code and gain share global information from entities to the whole 10% joint F1 promotion Ans Sup Fact Joint Model Setting EM F1 EM F1 EM F1 Yang et al.(2018) distractor 45.60 59.02 20.32 64.49 10.83 40.16 KGNN distractor 50.81 65.75 38.74 76.79 22.40 52.82 Yang et al.(2018) full wiki 23.95 32.89 3.86 37.71 1.85 16.15 KGNN full wiki 27.65 37.19 12.65 47.19 7.03 24.66

Table 1: Results on HotpotQA test set for distractor and full wiki settings.

Model EM F1 Model 10 20 30 Yang et al.(2018)-split 11.87 41.87 Yang et al.(2018) 20.77 20.88 21.52 Yang et al.(2018) 18.14 50.72 KGNN 22.20 25.02 25.12 KGNN (#Layer=1) 22.26 53.50 KGNN (#Layer=2) 22.41 54.05 Table 3: Effect of paragraph number on joint F1. KGNN (#Layer=3) 22.24 53.49

Table 2: Effect of Layer Number on joint metrics. 4 Related Work Open-domain question answering (OpenQA) aims 3.3 Overall Result to answer the given question by leveraging knowl- Table1 shows experimental results on HotpotQA edge sources, which is first proposed by Green Jr dataset. The table illustrates that our model out- et al.(1961). Recently, Chen et al.(2017) intro- performs Yang et al.(2018) in both distractor and duces machine reading comprehension technique full wiki settings. It indicates the effectiveness of to answer open-domain questions by reading mul- KGNN model in multi-paragraph reasoning with tiple retrieved texts. Wang et al.(2018a) selects external knowledge. a most informative paragraph to answer the ques- tion. Moreover, Wang et al.(2018b), Lin et al. 3.4 Effect of Layer Number (2018) and Clark and Gardner(2018) focus on In this part, we additionally perform experiments how to aggregate evidence in multiple paragraphs. to understand the effect of the layer number on However, all existing OpenQA models simply re- KGNN. From Table2, we could observe that gard each paragraph as an individual or concate- 2-layer KGNN model achieves the best perfor- nate all paragraphs into a single long text. mance. From carefully analyzing the data, we find Sun et al.(2018), De Cao et al.(2019) and Cao that most questions in HophotQA dataset only re- et al.(2019) introduce entity graph neural network quire 2-hop reasoning for obtaining the answer. to reason over entities for Knowledge Base QA And 2-layer KGNN model has the ability to cap- (KBQA). Different from KBQA, our work regards ture enough reasoning flows in this scenario. entities as bridges to reason over paragraphs and can give a span-style answer instead of supplying 3.5 Effect of Paragraph Number an entity-style answer. In real world (i.e., full wiki setting), an OpenQA 5 Conclusion system usually covers enough paragraphs to pro- vide useful information for answering questions. Multi-paragraph reasoning is crucial for answer- Nevertheless, the noise of paragraphs misleads ing open-domain questions in practice, while it is models and remains a challenge for OpenQA. still not considered in most existing OpenQA sys- For each question, we use 30 retrieved para- tems. In this work, we propose a novel OpenQA graphs to investigate the denoising ability of the model KGNN, which performs reasoning over system. As shown in Table3, KGNN model paragraphs via a knowledge enhanced graph neu- handles and acquires more information from ex- ral network. Experimental results show that tra paragraphs. KGNN’s performance increases KGNN outperforms strong baselines with a large more significantly than Yang et al.(2018). The re- margin on the HotpotQA dataset, and also has the sults demonstrate that reasoning across paragraphs ability to tackle more informative texts. We hope through entities and their relations is a robust and our work can shed some lights to the combination flexible way of encoding multiple paragraphs. of knowledge graph and text for OpenQA. References Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Ben- gio, William Cohen, Ruslan Salakhutdinov, and Yu Cao, Meng Fang, and Dacheng Tao. 2019. Bag: Bi- Christopher D Manning. 2018. Hotpotqa: A dataset directional attention entity graph convolutional net- for diverse, explainable multi-hop question answer- work for multi-hop reasoning question answering. ing. In Proceedings of EMNLP. In Proceedings of NAACL-HLT. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer open- domain questions. In Proceedings of the ACL. Eunsol Choi, Daniel Hewlett, Jakob Uszkoreit, Illia Polosukhin, Alexandre Lacoste, and Jonathan Be- rant. 2017. Coarse-to-fine question answering for long documents. In Proceedings of ACL. Christopher Clark and Matt Gardner. 2018. Simple and effective multi-paragraph reading comprehen- sion. In Proceedings of ACL. Nicola De Cao, Wilker Aziz, and Ivan Titov. 2019. Question answering by reasoning across documents with graph convolutional networks. In Proceedings of NAACL-HLT. Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. 2017. Quasar: Datasets for question an- swering by search and reading. arXiv preprint arXiv:1707.03904. Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho. 2017. SearchQA: A q&a dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179. Bert F Green Jr, Alice K Wolf, Carol Chomsky, and Kenneth Laughery. 1961. Baseball: an automatic question-answerer. In Proceedings of IRE-AIEE- ACM. Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale dis- tantly supervised challenge dataset for reading com- prehension. In Proceedings of ACL. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising distantly supervised open-domain question answering. In Proceedings of ACL. Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William W. Cohen. 2018. Open domain question answering us- ing early fusion of knowledge bases and text. In Proceedings of ACL. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2018a. R3: Reinforced ranker-reader for open-domain question answering. In Proceedings of AAAI. Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2018b. Evidence aggregation for answer re-ranking in open-domain question answering. In Proceedings of ICLR.