Multilingual Extractive Reading Comprehension by Runtime Machine Translation
Total Page:16
File Type:pdf, Size:1020Kb
Multilingual Extractive Reading Comprehension by Runtime Machine Translation Akari Asaiy, Akiko Eriguchiy, Kazuma Hashimotoz, and Yoshimasa Tsuruokay yThe University of Tokyo zSalesforce Research [email protected] zferiguchi,[email protected] [email protected] Abstract to English. To alleviate the scarcity of training data in non- Despite recent work in Reading Comprehen- English languages, previous work creates a new sion (RC), progress has been mostly limited to English due to the lack of large-scale datasets large-scale dataset for a language of interest (He in other languages. In this work, we introduce et al., 2017) or combines a medium-scale dataset the first RC system for languages without RC in the language with an existing dataset translated training data. Given a target language without from English (Lee et al., 2018). These efforts in RC training data and a pivot language with RC data creation are often costly, and must be repeated training data (e.g. English), our method lever- for each new language of interest. In addition, they ages existing RC resources in the pivot lan- do not leverage existing resources in English RC, guage by combining a competitive RC model such as the wealth of large-scale datasets and state- in the pivot language with an attentive Neural Machine Translation (NMT) model. We first of-the-art models. translate the data from the target to the pivot In this paper, we propose a multilingual extrac- language, and then obtain an answer using the tive RC method by runtime Machine Translation RC model in the pivot language. Finally, we (MT), a new method for building RC systems for recover the corresponding answer in the orig- languages without RC training data. Our method inal language using soft-alignment attention combines an RC model with a Neural Machine scores from the NMT model. We create eval- Translation model (NMT). Given a language L of uation sets of RC data in two non-English lan- guages, namely Japanese and French, to evalu- interest with no RC data and a pivot language P ate our method. Experimental results on these with large-scale RC training data, we first trans- datasets show that our method significantly late a document context and question from the outperforms a back-translation baseline of a language L to the language P using an attentive state-of-the-art product-level machine transla- NMT model. Next, we obtain an answer from the tion system. RC model in language P . Finally, we recover the answer in the context in language L using soft- 1 Introduction alignments from the NMT model. Extractive Reading Comprehension (RC), in To our knowledge, our work is the first method which a model identifies the answer to a given that requires no RC training data in the target lan- arXiv:1809.03275v2 [cs.CL] 2 Nov 2018 question from a document context by “extracting” guage to build an RC model for the language of the correct answer, has a variety of downstream interest. In contrast to existing work on RC in applications such as search, automated FAQs, and non-English languages, our method leverages ex- dialogue systems. Recent years have seen rapid isting work in English RC. More importantly, our progress in the development of RC models (Seo method requires no additional annotation effort to et al., 2017; Wang et al., 2017; Xiong et al., acquire RC data in the target language. 2018; Yu et al., 2018; Hu et al., 2018) due to the To demonstrate the effectiveness of our method, availability of large-scale annotated corpora (Her- we focus on SQuAD, one of the most widely-used mann et al., 2015; Rajpurkar et al., 2016; Joshi large-scale English datasets for extractive RC, and et al., 2017). However, these large-scale annotated created SQuAD test data in Japanese and French. datasets are often exclusive to English. Conse- On Japanese and French SQuAD, our method sig- quently, progress in RC has been largely limited nificantly outperforms a back-translation baseline 1 ↵ij <latexit sha1_base64="5/UoLW5mpMtRvo3dkuIh3phUGJI=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48V7AekoWy2m3btJht2J0IJ/RlePCji1V/jzX/jts1BWx8MPN6bYWZemEph0HW/ndLG5tb2Tnm3srd/cHhUPT7pGJVpxttMSaV7ITVcioS3UaDkvVRzGoeSd8PJ7dzvPnFthEoecJryIKajRESCUbSS36cyHdNBLh5ng2rNrbsLkHXiFaQGBVqD6ld/qFgW8wSZpMb4nptikFONgkk+q/Qzw1PKJnTEfUsTGnMT5IuTZ+TCKkMSKW0rQbJQf0/kNDZmGoe2M6Y4NqveXPzP8zOMboJcJGmGPGHLRVEmCSoy/58MheYM5dQSyrSwtxI2ppoytClVbAje6svrpHNV99y6d9+oNRtFHGU4g3O4BA+uoQl30II2MFDwDK/w5qDz4rw7H8vWklPMnMIfOJ8/l/yRZg==</latexit> attention scores Question in Question in Q<latexit sha1_base64="G4uyUPFEK2RZkYw5m8lY9ibL7o0=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5EiGXAxsIiQfMByRH2NnPJkr29Y3dPCEd+go2FIrb+Ijv/jZvkCk18MPB4b4aZeUEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk9u533lCpXksH800QT+iI8lDzqix0kNzcD8oV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bnDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZn/TYZcITNiagllittbCRtTRZmx6ZRsCN7qy+ukfVX13KrXvK7U3TyOIpzBOVyCBzWowx00oAUMRvAMr/DmCOfFeXc+lq0FJ585hT9wPn8A87mNgg==</latexit> L Q<latexit sha1_base64="RSRwn1mCyIlfeVM/dbvfSx7UNmU=">AAAB6nicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLgxmWL9gFtKJPppB06mYSZG6GEfoIbF4q49Yvc+TdO2iy09cDA4ZxzmXtPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2md7nffeLaiFg94izhfkTHSoSCUbTSQ2vYHFZrbt1dgKwTryA1KGDzX4NRzNKIK2SSGtP33AT9jGoUTPJ5ZZAanlA2pWPet1TRiBs/W6w6JxdWGZEw1vYpJAv190RGI2NmUWCTEcWJWfVy8T+vn2J462dCJSlyxZYfhakkGJP8bjISmjOUM0so08LuStiEasrQtlOxJXirJ6+TzlXdc+te67rWcIs6ynAG53AJHtxAA+6hCW1gMIZneIU3RzovzrvzsYyWnGLmFP7A+fwB+cmNhg==</latexit> P Target Pivot Language Attentive Neural Language Extractive (s, e) Machine Reading <latexit sha1_base64="eeNAar+nRdJpWZbzhkYyorIxHMs=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBahgpREBD0WvHisYD+gDWWznbRrN5uwuxFK6H/w4kERr/4fb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjlo5TxbDJYhGrTkA1Ci6xabgR2EkU0igQ2A7GtzO//YRK81g+mEmCfkSHkoecUWOlVlVfEDzvlytuzZ2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfza+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCGz/jMkkNSrZYFKaCmJjMXicDrpAZMbGEMsXtrYSNqKLM2IBKNgRv+eVV0rqseW7Nu7+q1N08jiKcwClUwYNrqMMdNKAJDB7hGV7hzYmdF+fd+Vi0Fpx85hj+wPn8ARy1jhk=</latexit> Phrase (s ,e ) <latexit sha1_base64="e+6EfY7xsP3THE+vwa0Q/pUVkBo=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoMQQcKuCHoMePGQQwTzwGRZZiedZMjs7DIzK4SQv/DiQRGv/o03/8ZJsgdNLGgoqrrp7goTwbVx3W8nt7a+sbmV3y7s7O7tHxQPj5o6ThXDBotFrNoh1Si4xIbhRmA7UUijUGArHN3O/NYTKs1j+WDGCfoRHUje54waKz2WdVC7IBjUzoNiya24c5BV4mWkBBnqQfGr24tZGqE0TFCtO56bGH9CleFM4LTQTTUmlI3oADuWShqh9ifzi6fkzCo90o+VLWnIXP09MaGR1uMotJ0RNUO97M3E/7xOavo3/oTLJDUo2WJRPxXExGT2PulxhcyIsSWUKW5vJWxIFWXGhlSwIXjLL6+S5mXFcyve/VWp6mZx5OEETqEMHlxDFe6gDg1gIOEZXuHN0c6L8+58LFpzTjZzDH/gfP4AwCiPlw==</latexit> L L Translation Comprehension Alignment Context in Model Context in Model <latexit sha1_base64="QoFvgP8Odwi21dO3Ce6kmfH0NGc=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5EiGUgjYVFRPMByRH2NnvJkr29Y3dOCEd+go2FIrb+Ijv/jZvkCk18MPB4b4aZeUEihUHX/XYKG5tb2zvF3dLe/sHhUfn4pG3iVDPeYrGMdTeghkuheAsFSt5NNKdRIHknmDTmfueJayNi9YjThPsRHSkRCkbRSg+Nwd2gXHGr7gJknXg5qUCO5qD81R/GLI24QiapMT3PTdDPqEbBJJ+V+qnhCWUTOuI9SxWNuPGzxakzcmGVIQljbUshWai/JzIaGTONAtsZURybVW8u/uf1Ugxv/EyoJEWu2HJRmEqCMZn/TYZCc4ZyagllWthbCRtTTRnadEo2BG/15XXSvqp6btW7v67U3TyOIpzBOVyCBzWowy00oQUMRvAMr/DmSOfFeXc+lq0FJ585hT9wPn8A3mWNdA==</latexit> CL C<latexit sha1_base64="XPctvbf2pu3H9+8pyJnfFASP+w4=">AAAB6nicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLQjcuK9gFtKJPppB06mYSZG6GEfoIbF4q49Yvc+TdO0yy09cDA4ZxzmXtPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mzYXffeLaiFg94izhfkTHSoSCUbTSQ3PYGlZrbt3NQdaJV5AaFLD5r8EoZmnEFTJJjel7boJ+RjUKJvm8MkgNTyib0jHvW6poxI2f5avOyYVVRiSMtX0KSa7+nshoZMwsCmwyojgxq95C/M/rpxje+plQSYpcseVHYSoJxmRxNxkJzRnKmSWUaWF3JWxCNWVo26nYErzVk9dJ56ruuXXv/rrWcIs6ynAG53AJHtxAA+6gBW1gMIZneIU3RzovzrvzsYyWnGLmFP7A+fwB5HWNeA==</latexit> P Target Pivot Language Language 1. Translation into pivot language 2. Extractive RC in pivot language 3. Attention-based answer alignment Figure 1: Overview of our method. αij are the attention weights (attention distribution) in the NMT model. (s; e) and (sL; eL) are the answer spans in the pivot language (e.g. English) and target language L, respectively. that first translates from the target language L to language P with which to train the RC model. the pivot language P , produces an answer in lan- guage P , and back-translates the answer into the 2.1 Translation to Pivot Language language L. Our method achieves this result de- To translate the the context and question from the spite using much smaller translation data than that target language L into the pivot language P , one of a state-of-the-art MT system used in the back- possible approach is to use a web service or a soft- translation baseline. ware package for MT (e.g. Google Translate1) as Analysis of our experiments shows that the a blackbox MT system (Hartrumpf et al., 2009; ability to correctly translate questions is crucial Espla-Gomis` et al., 2012; A.´ Garc´ıa Cumbreras for the end task of RC. In particular, over- et al., 2006). However, this approach does not al- sampling a small set of high quality question low us to access the internal intermediate informa- translations in training the NMT model results tion that is potentially useful for bridging the MT in significant accuracy gains in RC. Moreover, and RC systems. our error analysis shows that under-translation To overcome the limitations, we instead train an and paraphrasing in translation significantly attention-based NMT model (Luong et al., 2015) degrade the downstream RC accuracy, al- as a white-box MT system. Our attention-based though