Arxiv:1805.06145V1 [Cs.CL] 16 May 2018 Their Information to Select the final Answer
Total Page:16
File Type:pdf, Size:1020Kb
Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension Zhen Wang Jiachen Liu Xinyan Xiao Yajuan Lyu Tian Wu Baidu Inc., Beijing, China fwangzhen24, liujiachen, xiaoxinyan, lvyajuan, [email protected] Abstract Q Cocktails: Rum, lime, and cola drink make a . While sophisticated neural-based tech- A Cuba Libre P1 Daiquiri, the custom of mixing lime with rum for a niques have been developed in reading cooling drink on a hot Cuban day, has been around comprehension, most approaches model a long time. the answer in an independent manner, ig- P2 Cocktail recipe for a Daiquiri, a classic rum and noring its relations with other answer can- lime drink that every bartender should know. didates. This problem can be even worse P3 Hemingway Special Daiquiri: Daiquiris are a fam- ily of cocktails whose main ingredients are rum and in open-domain scenarios, where candi- lime juice. dates from multiple passages should be P4 A homemade Cuba Libre Preparation To make a combined to answer a single question. In Cuba Libre properly, fill a highball glass with ice and half fill with cola. this paper, we formulate reading com- P5 The difference between the Cuba Libre and Rum is prehension as an extract-then-select two- a lime wedge at the end. stage procedure. We first extract answer candidates from passages, then select the Table 1: The answer candidates are in a bold font. final answer by combining information The key information is marked in italic, which from all the candidates. Furthermore, we should be combined from different text pieces to regard candidate extraction as a latent vari- select the correct answer ”Cuba Libre”. able and train the two-stage process jointly with reinforcement learning. As a result, our approach has improved the state-of- Most existing approaches mainly focus on mod- the-art performance significantly on two eling the interactions between questions and pas- challenging open-domain reading compre- sages (Dhingra et al., 2017a; Seo et al., 2017; hension datasets. Further analysis demon- Wang et al., 2017), paying less attention to infor- strates the effectiveness of our model com- mation concerning answer candidates. However, ponents, especially the information fusion when human solve this problem, we often first of all the candidates and the joint training read each piece of text, collect some answer candi- of the extract-then-select procedure. dates, then focus on these candidates and combine arXiv:1805.06145v1 [cs.CL] 16 May 2018 their information to select the final answer. This 1 Introduction collect-then-select process can be more significant Teaching machines to read and comprehend hu- in open-domain scenarios, which require the com- man languages is a long-standing objective in nat- bination of candidates from multiple passages to ural language processing. In order to evaluate this answer one single question. This phenomenon is ability, reading comprehension (RC) is designed illustrated by the example in Table1. to answer questions through reading relevant pas- With this motivation, we formulate an extract- sages. In recent years, RC has attracted intense in- then-select two-stage architecture to simulate the terest. Various advanced neural models have been above procedure. The architecture contains two proposed along with newly released datasets (Her- components: (1) an extraction model, which gen- mann et al., 2015; Rajpurkar et al., 2016; Dunn erates answer candidates, (2) a selection model, et al., 2017; Dhingra et al., 2017b; He et al., 2017). which combines all these candidates and finds out the final answer. However, answer candidates to and a reader for producing the answer from it. be focused on are often unobservable, as most RC However, this approach only depended on one pas- datasets only provide golden answers. Therefore, sage when producing the answer, hence put great we treat candidate extraction as a latent variable demands on the precisions of both components. and train these two stages jointly with reinforce- Worse still, this framework cannot handle the sit- ment learning (RL). uation where multiple passages are needed to an- In conclusion, our work makes the following swer correctly. In consideration of evidence aggre- contributions: gation, Wang et al.(2018b) proposed a re-ranking 1. We formulate open-domain reading compre- method to resolve the above issue. However, their hension as a two-stage procedure, which first ex- re-ranking stage was totally isolated from the can- tracts answer candidates and then selects the final didate extraction procedure. Being different from answer. With joint training, we optimize these two the re-ranking perspective, we propose a novel se- correlated stages as a whole. lection model to combine the information from 2. We propose a novel answer selection model, all the extracted candidates. Moreover, with rein- which combines the information from all the ex- forcement learning, our candidate extraction and tracted candidates using an attention-based corre- answer selection models can be learned in a joint lation matrix. As shown in experiments, the infor- manner. Trischler et al.(2016) also proposed a mation fusion is greatly helpful for answer selec- two-step extractor-reasoner model, which first ex- tion. tracted K most probable single-token answer can- 3. With the two-stage framework and the joint didates and then compared the hypotheses with training strategy, our method significantly sur- all the sentences in the passage. However, in passes the state-of-the-art performance on two their work, each candidate was considered isolat- challenging public RC datasets Quasar-T (Dhingra edly, and their objective only took into account the et al., 2017b) and SearchQA (Dunn et al., 2017). ground truths compared with our RL treatment. The training strategy employed in our paper 2 Related Work is reinforcement learning, which is inspired by recent work exploiting it into question answer- In recent years, reading comprehension has made ing problem. The above mentioned coarse-to-fine remarkable progress in methodology and dataset framework (Choi et al., 2017; Wang et al., 2018a) construction. Most existing approaches mainly treated sentence selection as a latent variable and focus on modeling sophisticated interactions be- jointly trained the sentence selection module with tween questions and passages, then use the pointer the answer generation module via RL. Shen et al. networks (Vinyals et al., 2015) to directly model (2017) modeled the multi-hop reasoning proce- the answers (Dhingra et al., 2017a; Wang and dure with a termination state to decide when it is Jiang, 2017; Seo et al., 2017; Wang et al., 2017). adequate to produce an answer. RL is suitable to These methods prove to be effective in existing capture this stochastic behavior. Hu et al.(2018) close-domain datasets (Hermann et al., 2015; Hill merely modeled the extraction process, using F1 et al., 2015; Rajpurkar et al., 2016). as rewards in addition to maximum likelihood es- More recently, open-domain RC has attracted timation. RL was utilized in their training process, increasing attention (Nguyen et al., 2016; Dunn as the F1 measure is not differentiable. et al., 2017; Dhingra et al., 2017b; He et al., 2017) and raised new challenges for question answer- 3 Two-stage RC Framework ing techniques. In these scenarios, a question is paired with multiple passages, which are often In this work, we mainly consider the open-domain collected by exploiting unstructured documents or extractive reading comprehension. In this sce- web data. Aforementioned approaches often rely nario, a given question Q is paired with mul- on recurrent neural networks and sophisticated at- tiple passages P = fP1;P2; :::; PN g, based on tentions, which are prohibitively time-consuming which we aim to find out the answer A. Moreover, if passages are concatenated altogether. There- the golden answers are almost subspans shown in fore, some work tried to alleviate this problem in some passages in P. Our main framework con- a coarse-to-fine schema. Wang et al.(2018a) com- sists of two parts, which are: (1) extracting answer bined a ranker for selecting the relevant passage candidates C = fC1;C2; :::; CM g from passages b e P1 P P Candidate Start End Scoring GP P2 . C Q A . Question & Passage ~ . Interaction HP … Attention PN H Candidate Extraction Answer Selection Question & Passage Q HP Representation … Q P … 1 1 l Figure 1: Two-stage RC Framework. The first part x … lQ x 2 x P Q xQ P xP … P extracts candidates (denoted with circles) from all Question Passage the passages. The second part establishes interac- tions among all these candidates to select the final Figure 2: Candidate Extraction Model Architec- answer. The different gray scales of dashed lines ture. between candidates represent different intensities of interactions. Question & Passage Representation Firstly, k lQ we embed the question Q = fxQgk=1 and its rele- t lP P and (2) selecting the final answer A from candi- vant passage P = fxP gt=1 2 P with word vectors d ×l d ×l dates C. This process is illustrated in Figure1. We to form Q 2 R w Q and P 2 R w P respec- design different models for each part and optimize tively, where dw is the dimension of word embed- them as a whole with joint reinforcement learning. dings, lQ and lP are the length of Q and P . We then feed Q and P to a bidirectional LSTM 3.1 Candidate Extraction to form their contextual representations HQ 2 dh×lQ and H 2 dh×lP : We build candidate set C by independently ex- R P R tracting K candidates from each passage Pi ac- HQ = BiLSTM(Q) cording to the following distribution: (3) HP = BiLSTM(P) N Y Question & Passage Interaction Modeling the p( jQ; ) = p(fC gK jQ; P ) C P ij j=1 i interactions between questions and passages is a i (1) critical step in reading comprehension.