A Deep Reinforcement Learning Based Multi-Step Coarse to Fine Question Answering (MSCQA) System
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) A Deep Reinforcement Learning Based Multi-Step Coarse to Fine Question Answering (MSCQA) System Yu Wang, Hongxia Jin Samsung Research America fyu.wang1, [email protected] Abstract sentences, which helps reduce the computational workload. The fast model can be implemented using bag-of-words In this paper, we present a multi-step coarse to fine ques- (BoW) or convolutional neural network and the slow model tion answering (MSCQA) system which can efficiently pro- cesses documents with different lengths by choosing appro- uses an RNN based model. The model gives decent perfor- priate actions. The system is designed using an actor-critic mance on long documents in WIKIREADING dataset, and based deep reinforcement learning model to achieve multi- shows significant speed-up compared to earlier models. This step question answering. Compared to previous QA models coarse to fine model, however, still have several disadvan- targeting on datasets mainly containing either short or long tages: documents, our multi-step coarse to fine model takes the mer- 1.The model doesn’t perform as good as baseline models its from multiple system modules, which can handle both on the full WIKIREADING dataset including both short and short and long documents. The system hence obtains a much long documents (Hewlett et al. 2016). One reason is because better accuracy and faster trainings speed compared to the that the conventional RNN based models perform better on current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN generating answers from the first few sentences of a doc- and SQuAD, and demonstrate 1.3%-1.7% accuracy improve- ument, hence can obtain the correct results on short docu- ments with 1.5x-3.4x training speed-ups in comparison to the ments more accurately than a coarse-to-fine model . baselines using state-of-the-art models. 2. Many wrong answers are actually from the same sen- tence containing the correct answer. By using a coarse-to- fine model on sentence level, it may still extract wrong an- Introduction swer by ignoring the correct one in the same sentence. Machine comprehension based question answering (QA) 3. Some of the contextual information in document is still tasks have drawn lots of interests from the natural lan- useful and necessary in order to find the correct answers. guage understanding research community. During the past However, by only selecting the related sentence(s), the sys- several years, lots of progresses have been made on build- tem may ignore some important information to make the ing deep learning based QA models to answer questions correct judgment. from large scale datasets including unstructured documents On the other hand, compared to the coarse-to-fine model (Hermann et al. 2015; Hill et al. 2015; Onishi et al. 2016; which gives decent results on long documents, more ques- Trischler et al. 2016; Nguyen et al. 2016). Currently, most tion answering models focus on improving performance on of the state-of-the-art models for different QA datasets relatively short documents. QANet (Yu et al. 2018) uses are based on recurrent neural networks which can process the convolutions and self-attentions to boost the training the sequential inputs, and with a (co-)attention structure speed and gives one of the state-of-the-art results on the to deal with the long term interactions between questions SQuAD1.1 dataset (Rajpurkar et al. 2016), which contains and document context (Xiong, Zhong, and Socher 2016; mainly short documents (average 122 tokens/document). Chen, Bolton, and Manning 2016; Hermann et al. 2015; Due to these reasons, in this paper, we would like to in- Kadlec et al. 2016). One disadvantage of these models is that troduce a novel mult-step coarse to fine question answer- their training and inference speeds are relatively slow due to ing (MSCQA) system which can handle both long and their recurrent nature, and the other weakness is that they short documents. This new structure takes the advantages are still not good at dealing with very long documents and from both coarse-to-fine and QANet structures depending model of the models use the truncated documents as their the lengths of documents. Furthermore, a false-positive de- inputs (Miller et al. 2016; Hewlett et al. 2016). tection mechanism is also designed to further improve the In (Choi et al. 2017), it introduces a coarse-to-fine ques- model performance. tion answering model to select the related sentences in the In order to guide the model to learn what is the best action long document first, then find the answers from the selected to perform at each step, three types of states are defined as: Copyright c 2019, Association for the Advancement of Artificial 1. Generate answers directly from the answer generation Intelligence (www.aaai.org). All rights reserved. module (Terminal State). 7224 2. Select the related sentences and use them for next round (Non-terminal State)˙ Document !# Question $ 3. Remove the inaccurate answer generate from the answer generation module, and use the rest of the document for next + round (Non-terminal State). Action Selector The first two states are corresponding to the QANet an- swer generation module and a coarse sentence selector in the coarse-to-fine model. The last state is designed to han- Sentence Answer Sub-context !& dle the case where the predicted answer given by the answer Selection !" Generation %# excluding %# Non-terminal Terminal State Non-terminal generator is a false positive one, which should be removed State State from current document context to avoid confusion. ) ) !#'( = !#= !* At each round of the decision making process, the system !#'( = !#= !" performs an action and reaches one out of the three states. If it is a non-terminal state, the system will continue to search Figure 1: System Diagram the correct answer from the current context, otherwise it will generate the answer from the answer generation mod- ule, which is a terminal state. A detailed system description Document embedding !" Qusetion embedding # will be given in next section. In this paper, we propose a DRL based multi-step coarse- to-fine QA (MSCQA) system, which mainly gives three con- Figure 2: Design of State tributions: 1. It is the first multi-step QA system which has the ca- target of a DRL based model is to seek the best policy π∗, pability to decide whether to generate a fine result or a hence the corresponding action a∗, for an agent to perform. coarse one based on the current context. By using this tech- There are mainly three types of reinforcement learning al- nique, the model can handle both short and long documents gorithms: value-based, policy gradient and actor-critic. We in faster and a more robustness manner. To the best of our choose the actor-critic based DRL model in order to obtain knowledge, we are the first one to do so. a relative stable training result for a large state space as in 2. Besides generating answers from coarse or fine con- our system. text, the system also learn another ability to reject the false- The definition of the states st and the actions at in our sys- positive answer given by the answer generator. This self- tem will be given in the section. We will also discuss how to correctness ability makes the system more ”smart”. train the action selection module using our selected deep re- 3. The system achieves significant better QA accuracy inforcement learning model, i.e. an Actor-Critic based DRL compared to our baseline models on four QA datasets, and model. 1.5x-3.4x training speed-up due to the improved efficiency by reallocating of training data using three actions. States (st) The design of the state is as given in Figure 2, which is the input of the action selection module. It mainly The System and Model Structure contains two parts: one is the encoded document informa- tion, and the other is the question information. The encoded In this section, the system structure of our multi-step coarse- document (D^ i) at step i have three different types of vari- to-fine QA system is given. The system contains four parts: eties: the first one is a DRL based action selector, the second one Type 1: It contains the full document information in step is a sentence selection module (M1), the third one is an an- i, i.e. D^ = D . At step 0, this document is the the same as swer generation module (M2) and the last one is the sub- i i the original given document (D), i.e. D = D. context generation module (M3). Both the sub-context and 0 sentence selection module are non-terminal states, and the Type 2: It contains the selected sentences information, i.e. ^ K th answer generation module is a terminal state. Our system Di = Ds = [i=isi, where si is the i selected sentence diagram is as given in Figure 1. from the current document Di, and K is the total number of sentences selected. Action Selector Type 3: It contains the current document Di excluding the The action selector is designed to generate one action to pick answer Ai predicted by Di, i.e. D^ i = Dc = Di n Ai. one of the three modules at each time step. In order to model Again, the third type of state’s definition is based on the the process of selecting an action, we choose a deep rein- fact that most of the correct answers are within the top K forcement learning (DRL) based algorithm to learn it auto- possible answers, if the one with highest probability is not matically.