Multi-Hop Question Generation with Graph Convolutional Network

Multi-hop Question Generation with Graph Convolutional Network Dan Su, Yan Xu, Wenliang Dai, Ziwei Ji, Tiezheng Yu, Pascale Fung Center for Artificial Intelligence Research (CAiRE) Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong fdsu, yxucb, wdaiai, zjiad, [email protected], [email protected] Abstract Paragraph A: Marine Tactical Air Command Squadron 28 (Location T) is a United States Marine Multi-hop Question Generation (QG) aims to Corps aviation command and control unit based generate answer-related questions by aggre- at Marine Corps Air Station Cherry Point (Location gating and reasoning over multiple scattered evidence from different paragraphs. It is a C) ... more challenging yet under-explored task com- Paragraph B: Marine Corps Air Station Cherry pared to conventional single-hop QG, where Point (Location C) ... is a United States Marine the questions are generated from the sentence Corps airfield located in Havelock, North Car- containing the answer or nearby sentences in olina (Location H), USA ... the same paragraph without complex reason- Answer: Havelock, North Carolina (Location H) ing. To address the additional challenges Question: What city is the Marine Air Control in multi-hop QG, we propose Multi-Hop En- coding Fusion Network for Question Genera- Group 28 (Location T) located in? tion (MulQG), which does context encoding in multiple hops with Graph Convolutional Table 1: An example of multi-hop QG in the Hot- Network and encoding fusion via an Encoder potQA (Yang et al., 2018) dataset. Given the answer Reasoning Gate. To the best of our knowl- is Location H, to ask where is T located, the model edge, we are the first to tackle the challenge needs a bridging evidence to know that T is located in of multi-hop reasoning over paragraphs with- C, and C is located in H (T ! C ! H). This is done out any sentence-level information. Empiri- by multi-hop reasoning. cal results on HotpotQA dataset demonstrate the effectiveness of our method, in comparison with baselines on automatic evaluation metrics. single-hop reasoning (Zhou et al., 2017; Zhao et al., Moreover, from the human evaluation, our pro- 2018). Little effort has been put in multi-hop QG, posed model is able to generate fluent ques- which is a more challenging task. Multi-hop QG re- tions with high completeness and outperforms quires aggregating several scattered evidence spans the strongest baseline by 20.8% in the multi- from multiple paragraphs, and reasoning over them hop evaluation. The code is publicly available to generate answer-related, factual-coherent ques- at https://github.com/HLTCHKUST/MulQG. tions. It can serve as an essential component in 1 Introduction education systems (Heilman and Smith, 2010; Lind- berg et al., 2013; Yao et al., 2018), or be applied Question Generation (QG) is a task to automati- in intelligent virtual assistant systems (Shum et al., cally generate a question from a given context and, 2018; Pan et al., 2019). It can also combine with optionally, an answer. Recently, we have observed question answering (QA) models as dual tasks to an increasing interest in text-based QG (Du et al., boost QA systems with reasoning ability (Tang 2017; Zhao et al., 2018; Scialom et al., 2019; Nema et al., 2017). et al., 2019; Zhang and Bansal, 2019). Intuitively, there are two main additional chal- Most of the existing works on text-based QG lenges needed to be addressed for multi-hop QG. focus on generating SQuAD-style (Rajpurkar et al., The first challenge is how to effectively iden- 2016; Puri et al., 2020) questions, which are gen- tify scattered pieces of evidence that can con- erated from the sentence containing the answer nect the reasoning path of the answer and ques- or nearby sentences in the same paragraph, via tion (Chauhan et al., 2020). As the example shown 4636 Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4636–4647 November 16 - 20, 2020. c 2020 Association for Computational Linguistics in Table1, to generate a question asking about “Ma- • To the best of our knowledge, we are the first rine Air Control Group 28” given only the answer to tackle the challenge of multi-hop reasoning “Havelock, North Carolina”, we need the bridging over paragraphs without any sentence-level evidence like “Marine Corps Air Station Cherry information in QG tasks. Point”. The second challenge is how to reason over • We propose a new and effective framework for multiple pieces of scattered evidence to generate Multi-hop QG, to do context encoding in mul- factual-coherent questions. tiple hops(steps) with Graph Convolutional Previous works mainly focus on single-hop QG, Network (GCN). which use neural network based approaches with • We show the effectiveness of our method on the sequence-to-sequence (Seq2Seq) framework. both automatic evaluation and human evalu- Different architectures of encoder and decoder have ation, and we make the first step to evaluate been designed (Nema et al., 2019; Zhao et al., 2018) the model performance in multi-hop aspect. to incorporate the information of answer and context to do single-hop reasoning. To the best of 2 Methodology our knowledge, none of the previous works address the two challenges we mentioned above for The intuition is drawn from human’s multi-hop multi-hop QG task. The only work on multi-hop question generation process (Davey and McBride, QG (Chauhan et al., 2020) uses multi-task learning 1986). Firstly, given the answer and context, we with an auxiliary loss for sentence-level supporting skim to establish a general understanding of the fact prediction, requiring supporting fact sentences texts. Then, we find the mentions of entities in in different paragraphs being labeled in the training or correlated to the answer from the context, and data. While labeling those supporting facts requires analyse nearby sentences to extract useful evidence. heavy human labor and is time-consuming, their Besides, we may also search for linked information method cannot be applied to general multi-hop QG in other paragraphs to gain a further understanding cases without supporting facts. of the entities. Finally, we coherently fuse our knowledge learned from the previous steps and In this paper, we propose a novel architecture start to generate questions. named Multi-Hop Encoding Fusion Network for Question Generation (MulQG) to address the afore- To mimic this process, we develop our MulQG mentioned challenges for multi-hop QG. First of framework. The encoding stage is achieved by a all, it extends the Seq2Seq QG framework from novel Multi-hop Encoder. At the decoding stage, sing-hop to multi-hop for context encoding. Ad- we use maxout pointer decoder as proposed in Zhao ditionally, it leverages a Graph Convolutional Net- et al.(2018). The overview of the framework is work (GCN) on an answer-aware dynamic entity shown in Figure1. graph, which is constructed from entity mentions in answer and input paragraphs, to aggregate the po- 2.1 Multi-hop Encoder tential evidence related to the questions. Moreover, Our Multi-hop Encoder includes three modules: we use different attention mechanisms to imitate (1) Answer-aware context encoder (2) GCN-based the reasoning procedures of human beings in multi- entity-aware answer encoder (3) Gated encoder hop generation process, the details are explained in reasoning layer. Section2. The context and answer are split into word-level We conduct the experiments on the multi-hop tokens and denoted as c = fc1; c2; :::; cng and QA dataset HotpotQA (Yang et al., 2018) with a = fa1; a2; :::; amg, respectively. Each word our model and the baselines. The proposed model is represented by the pre-trained GloVe embed- outperforms the baselines with a significant im- ding (Pennington et al., 2014). Furthermore, for the provement on automatic evaluation results, such as words in context, we also append the answer tag- BLEU (Papineni et al., 2002). The human evalua- ging embeddings as described in Zhao et al.(2018). tion results further validate that our proposed model The context and answer embeddings are fed into is more likely to generate multi-hop questions with two bidirectional LSTM-RNNs separately to obtain Fluency Answerability d×n high quality in terms of , and their initial contextual representations C0 2 R Completeness d×m scores. and A0 2 R , in which d is the hidden state Our contributions are summarized as follows: dimension in LSTM. 4637 Multi-hop Encoder Decoder Encoder Reasoning Gate Answer-aware Context Encoder Attention Layer Entity-aware Answer Encoder residual Bi-Attn Maxout Pointer Generator Entity Graph Answer-aware Context Encoder LSTM LSTM LSTM Encoder Encoder Decoder Word Embeddings + Answer Tag Answer Context Figure 1: Overview of our MulQG framework. In the encoding stage, we pass the initial context encoding C0 and answer encoding A0 to the Answer-aware Context Encoder to obtain the first context encoding C1, then C1 and A0 will be used to update a multi-hop answer encoding A1 via the GCN-based Entity-aware Answer Encoder, and y we use A1 and C1 back to the Answer-aware Context Encoder to obtain C2. The final context encoding Cfinal are obtained from the Encoder Reasoning Gate which operates over C1 and C2, and will be used in the max-out based decoding stage. mask d×n C1 = BiLSTM([C~1; C0]) 2 R (6) GCN Firstly, we compute an alignment matrix S (Eq.1), and normalize it column-wise and row- wise to get two attention matrices S0 (Eq.2) and S00 ... (Eq.3). S0 represents the relevance of each answer token over the context, and S00 represents the relevance of each context token over the answer.

Load more