Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension
Total Page:16
File Type:pdf, Size:1020Kb
Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension Zhen Wang Jiachen Liu Xinyan Xiao Yajuan Lyu Tian Wu Baidu Inc., Beijing, China fwangzhen24, liujiachen, xiaoxinyan, lvyajuan, [email protected] Abstract Q Cocktails: Rum, lime, and cola drink make a . While sophisticated neural-based tech- A Cuba Libre P1 Daiquiri, the custom of mixing niques have been developed in reading lime with rum for a cooling drink comprehension, most approaches model on a hot Cuban day, has been the answer in an independent manner, ig- around a long time. noring its relations with other answer can- P2 Cocktail recipe for a Daiquiri, a classic rum and lime drink that didates. This problem can be even worse every bartender should know. in open-domain scenarios, where candi- P3 Hemingway Special Daiquiri: dates from multiple passages should be Daiquiris are a family of cocktails whose main ingredients combined to answer a single question. In are rum and lime juice. this paper, we formulate reading com- P4 A homemade Cuba Libre Preparation prehension as an extract-then-select two- To make a Cuba Libre properly, stage procedure. We first extract answer fill a highball glass with ice and half fill with cola. candidates from passages, then select the P5 The difference between the Cuba final answer by combining information Libre and Rum is a lime wedge at from all the candidates. Furthermore, we the end. regard candidate extraction as a latent vari- able and train the two-stage process jointly Table 1: The answer candidates are in a bold font. with reinforcement learning. As a result, The key information is marked in italic, which our approach has improved the state-of- should be combined from different text pieces to the-art performance significantly on two select the correct answer ”Cuba Libre”. challenging open-domain reading compre- hension datasets. Further analysis demon- Most existing approaches mainly focus on mod- strates the effectiveness of our model com- eling the interactions between questions and pas- ponents, especially the information fusion sages (Dhingra et al., 2017a; Seo et al., 2017; of all the candidates and the joint training Wang et al., 2017), paying less attention to infor- of the extract-then-select procedure. mation concerning answer candidates. However, when human solve this problem, we often first 1 Introduction read each piece of text, collect some answer candi- Teaching machines to read and comprehend hu- dates, then focus on these candidates and combine man languages is a long-standing objective in nat- their information to select the final answer. This ural language processing. In order to evaluate this collect-then-select process can be more significant ability, reading comprehension (RC) is designed in open-domain scenarios, which require the com- to answer questions through reading relevant pas- bination of candidates from multiple passages to sages. In recent years, RC has attracted intense in- answer one single question. This phenomenon is terest. Various advanced neural models have been illustrated by the example in Table1. proposed along with newly released datasets (Her- With this motivation, we formulate an extract- mann et al., 2015; Rajpurkar et al., 2016; Dunn then-select two-stage architecture to simulate the et al., 2017; Dhingra et al., 2017b; He et al., 2017). above procedure. The architecture contains two 1715 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 1715–1724 Melbourne, Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics components: (1) an extraction model, which gen- fore, some work tried to alleviate this problem in erates answer candidates, (2) a selection model, a coarse-to-fine schema. Wang et al.(2018a) com- which combines all these candidates and finds out bined a ranker for selecting the relevant passage the final answer. However, answer candidates to and a reader for producing the answer from it. be focused on are often unobservable, as most RC However, this approach only depended on one pas- datasets only provide golden answers. Therefore, sage when producing the answer, hence put great we treat candidate extraction as a latent variable demands on the precisions of both components. and train these two stages jointly with reinforce- Worse still, this framework cannot handle the sit- ment learning (RL). uation where multiple passages are needed to an- In conclusion, our work makes the following swer correctly. In consideration of evidence aggre- contributions: gation, Wang et al.(2018b) proposed a re-ranking 1. We formulate open-domain reading compre- method to resolve the above issue. However, their hension as a two-stage procedure, which first ex- re-ranking stage was totally isolated from the can- tracts answer candidates and then selects the final didate extraction procedure. Being different from answer. With joint training, we optimize these two the re-ranking perspective, we propose a novel se- correlated stages as a whole. lection model to combine the information from 2. We propose a novel answer selection model, all the extracted candidates. Moreover, with rein- which combines the information from all the ex- forcement learning, our candidate extraction and tracted candidates using an attention-based corre- answer selection models can be learned in a joint lation matrix. As shown in experiments, the infor- manner. Trischler et al.(2016) also proposed a mation fusion is greatly helpful for answer selec- two-step extractor-reasoner model, which first ex- tion. tracted K most probable single-token answer can- 3. With the two-stage framework and the joint didates and then compared the hypotheses with training strategy, our method significantly sur- all the sentences in the passage. However, in passes the state-of-the-art performance on two their work, each candidate was considered isolat- challenging public RC datasets Quasar-T (Dhingra edly, and their objective only took into account the et al., 2017b) and SearchQA (Dunn et al., 2017). ground truths compared with our RL treatment. The training strategy employed in our paper 2 Related Work is reinforcement learning, which is inspired by recent work exploiting it into question answer- In recent years, reading comprehension has made ing problem. The above mentioned coarse-to-fine remarkable progress in methodology and dataset framework (Choi et al., 2017; Wang et al., 2018a) construction. Most existing approaches mainly treated sentence selection as a latent variable and focus on modeling sophisticated interactions be- jointly trained the sentence selection module with tween questions and passages, then use the pointer the answer generation module via RL. Shen et al. networks (Vinyals et al., 2015) to directly model (2017) modeled the multi-hop reasoning proce- the answers (Dhingra et al., 2017a; Wang and dure with a termination state to decide when it is Jiang, 2017; Seo et al., 2017; Wang et al., 2017). adequate to produce an answer. RL is suitable to These methods prove to be effective in existing capture this stochastic behavior. Hu et al.(2018) close-domain datasets (Hermann et al., 2015; Hill merely modeled the extraction process, using F1 et al., 2015; Rajpurkar et al., 2016). as rewards in addition to maximum likelihood es- More recently, open-domain RC has attracted timation. RL was utilized in their training process, increasing attention (Nguyen et al., 2016; Dunn as the F1 measure is not differentiable. et al., 2017; Dhingra et al., 2017b; He et al., 2017) and raised new challenges for question answer- 3 Two-stage RC Framework ing techniques. In these scenarios, a question is paired with multiple passages, which are often In this work, we mainly consider the open-domain collected by exploiting unstructured documents or extractive reading comprehension. In this sce- web data. Aforementioned approaches often rely nario, a given question Q is paired with mul- on recurrent neural networks and sophisticated at- tiple passages P = fP1;P2; :::; PN g, based on tentions, which are prohibitively time-consuming which we aim to find out the answer A. Moreover, if passages are concatenated altogether. There- the golden answers are almost subspans shown in 1716 b e P1 P P Candidate Start End Scoring GP P2 . C Q A . Question & Passage ~ . Interaction HP … Attention PN H Candidate Extraction Answer Selection Question & Passage Q HP Representation … Q P … 1 1 l Figure 1: Two-stage RC Framework. The first part x … lQ x 2 x P Q xQ P xP … P extracts candidates (denoted with circles) from all Question Passage the passages. The second part establishes interac- tions among all these candidates to select the final Figure 2: Candidate Extraction Model Architec- answer. The different gray scales of dashed lines ture. between candidates represent different intensities of interactions. tecture concerning candidate extraction, which is displayed in Figure2. some passages in . Our main framework con- P Question & Passage Representation Firstly, sists of two parts, which are: (1) extracting answer k lQ we embed the question Q = fxQgk=1 and its rele- candidates C = fC1;C2; :::; CM g from passages P = fxt glP 2 P and (2) selecting the final answer A from candi- vant passage P t=1 P with word vectors Q 2 dw×lQ P 2 dw×lP dates C. This process is illustrated in Figure1. We to form R and R respec- design different models for each part and optimize tively, where dw is the dimension of word embed- them as a whole with joint reinforcement learning. dings, lQ and lP are the length of Q and P . We then feed Q and P to a bidirectional LSTM 3.1 Candidate Extraction to form their contextual representations HQ 2 dh×lQ and H 2 dh×lP : We build candidate set C by independently ex- R P R tracting K candidates from each passage Pi ac- HQ = BiLSTM(Q) cording to the following distribution: (3) HP = BiLSTM(P) N Y K Question & Passage Interaction Modeling the p(CjQ; P) = p(fCijgj=1jQ; Pi) interactions between questions and passages is a i (1) N critical step in reading comprehension.