Explore, Propose, and Assemble: an Interpretable Model for Multi-Hop Reading Comprehension
Total Page:16
File Type:pdf, Size:1020Kb
Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension Yichen Jiang∗ Nitish Joshi∗ Yen-Chun Chen Mohit Bansal UNC Chapel Hill fyichenj, nitish, yenchun, [email protected] Abstract datasets require single-hop reasoning only, which means that the evidence necessary to answer the Multi-hop reading comprehension requires the question is concentrated in a single sentence or lo- model to explore and connect relevant infor- cated closely in a single paragraph. Such datasets mation from multiple sentences/documents in order to answer the question about the con- emphasize the role of locating, matching, and text. To achieve this, we propose an in- aligning information between the question and the terpretable 3-module system called Explore- context. However, some recent multi-document, Propose-Assemble reader (EPAr). First, the multi-hop reading comprehension datasets, such Document Explorer iteratively selects relevant as WikiHop and MedHop (Welbl et al., 2017), documents and represents divergent reasoning have been proposed to further assess MRC sys- chains in a tree structure so as to allow assim- tems’ ability to perform multi-hop reasoning, ilating information from all chains. The An- swer Proposer then proposes an answer from where the required evidence is scattered in a set every root-to-leaf path in the reasoning tree. of supporting documents. Finally, the Evidence Assembler extracts a These multi-hop tasks are much more chal- key sentence containing the proposed answer lenging than previous single-hop MRC tasks (Ra- from every path and combines them to pre- jpurkar et al., 2016, 2018; Hermann et al., 2015; dict the final answer. Intuitively, EPAr ap- Nguyen et al., 2016; Yang et al., 2015) for three proximates the coarse-to-fine-grained compre- primary reasons. First, the given context contains hension behavior of human readers when fac- ing multiple long documents. We jointly op- a large number of documents (e.g., 14 on aver- timize our 3 modules by minimizing the sum age, 64 maximum for WikiHop). Most existing of losses from each stage conditioned on the QA models cannot scale to the context of such previous stage’s output. On two multi-hop length, and it is challenging to retrieve a reason- reading comprehension datasets WikiHop and ing chain of documents with complete informa- MedHop, our EPAr model achieves significant tion required to connect the question to the an- improvements over the baseline and compet- swer in a logical way. Second, given a reason- itive results compared to the state-of-the-art ing chain of documents, it is still necessary for the model. We also present multiple reasoning- chain-recovery tests and ablation studies to model to consider evidence loosely distributed in demonstrate our system’s ability to perform in- all these documents in order to predict the final an- arXiv:1906.05210v1 [cs.CL] 12 Jun 2019 terpretable and accurate reasoning.1 swer. Third, there could be more than one logical way to connect the scattered evidence (i.e., more 1 Introduction than one possible reasoning chain) and hence this requires models to assemble and weigh informa- The task of machine reading comprehension tion collected from every reasoning chain before and question answering (MRC-QA) requires the making a unified prediction. model to answer a natural language question by finding relevant information and knowledge in To overcome the three difficulties elaborated a given natural language context. Most MRC above, we develop our interpretable 3-module sys- tem based on examining how a human reader ∗ equal contribution; part of this work was done during would approach a question, as shown in Fig. 1a the second author’s internship at UNC (from IIT Bombay). 1Our code is publicly available at: and Fig. 1b. For the 1st example, instead of read- https://github.com/jiangycTarheel/EPAr ing the entire set of supporting documents sequen- The THhaeu nHtaedu nCteadst lCea (s Dtleu t(c Dh u: tScpho :o Sksplooot k) silso at )h aisu nat heda uantttreadc taiottnra icnt itohne in the The PoTlshte rPboerlsgt ePrubmerpgh oPuusme p( hGoeursme a(n G : ePromlsatne r:b Peroglestre Hrbuebrhgaeur sH )u ibs haa us ) is a amuasemmuesnetm parkent park Efteling Efteling in th ei nN tehteh eNrleatnhdesr l.a Intd ws a. sItd wesaigsndeeds ibgyn ed by pumpinpgu mstaptiinogn satbaotivoen theabo Dykeve the Ditch Dyke i nDitch the Upper in the Harz Upper in Harzcentra iln central Ton Tvoan dvea nV dene aVnedn .a..nd ... GermaGnye r..m. any ... EftelingEfteling is a fiasn ata fsayn-ttahseym-tehde mameuds aemuenset mpaernkt ipna Kaatsheuvelrk in Kaatsheuvel in the in the The DykeThe Ditch Dyke iDitchs the l oins gtheset laorntigfeicsita al rdtitfcichi ainl dthitec hUpper in the Harz Upper in Harz in NethNereltahnedrsla. nTdhse. aTthtrea cattitorancs taioren sb asred b oanse edl eomn eenltesm freonmts afrnocmie natn mciyetnhts m yths centralc Genetrrmala nGye. rmany. and alengde nledgse, nfadisr,y f taailreys ,t aflaebsl,e fsa, balneds ,f oanlkdl ofroel.klore. The UpperThe UpperHarz r eHarzfers t ore .f.e. rths et ot e.r..m t hUep tpeermr H Uaprzp ecro vHearsrz t hceo vaereras tohfe t haere a of the KaatsheuvelKaatsheuvel is a viisl laa gveil ilnag teh ei nD tuhtec hD purtocvhi npcroe voifn cNeo ortfh N Borratbha Bntr,a bant, seven hsiesvtoerni chails tmoirnicinagl mtoiwninnsg ( t\"oBwenrsg s(t\\"uB0e0reg4sdt\tue\0"0) e-4 Cdltaeu\"s)t h-a Cl,l austhal, situastietdu a..t.e idt .i.s. tiht eis l athrgee lsatr vgielslat gvei lilna gaen din t haen dc atphiet acla opfi ttahle o mf uthneic mipuanliitcyi pality ZellerfZeledl,l eArnfedlrde,a Asbnedrgre, aAsblternga,u A, Lltaeuntaeun,t hLaalu, tWeniltdheaml, aWnnil daenmd aGnrnu nadn d- Grund - of Loonof Loon op Zand op Zand, whi,c wh haliscoh caolsnos icsotsn .s..ists ... in the pinre tsheen tp-dreasye Gnte-drmaya nG feerdmeraanl fsetadter oalf sLowertate of SaxonyLower .Saxony . QuerQyu seurbyj escutb: jTehcte: HThaeu nHteadu nCtaesdt lCe astle Query Qsuubejreyc ts: uPbojlesctte:r Pbeorlgs tPerubmeprgh oPuusme phouse QuerQyu beorydy b:o ldoyca: tleodc_aitne_dt_hien__atdhme_inadismtraintiivsetr_atteivrreit_otreirarli_toernitaitly_entity Query Qboudeyry: lboocadtye:d l_oicna_ttehde_iand_mthinei_satrdamtivinei_stterarrtitvoer_iatel_rerintotirtyial_entity AnswAenrs:w Looner: Loon op Zand op Zand AnsweAr: nLowerswer: LowerSaxony Saxony (a) (b) Figure 1: Two examples from the QAngaroo WikiHop dataset where it is necessary to combine information spread across multiple documents to infer the correct answer. (a): The hidden reasoning chain of 3 out of a total of 37 documents for a single query. (b): Two possible reasoning chains that lead to different answers: “Upper Harz” and “Lower Saxony”, while the latter (green solid arrow) fits better with query body “administrative territorial entity”. tially, she would start from the document that is all possible reasoning chains/paths. Hence, to be directly related to the query subject (e.g., “The able to weigh and combine information from mul- Haunted Castle”). She could then read the second tiple reasoning branches, the Document Explorer and third document by following the connecting is rolled out multiple times to represent all the entities “park Efteling” and “Kaatsheuvel”, and divergent reasoning chains in a ‘reasoning tree’ uncover the answer “Loon op Zand” by comparing structure, so as to allow our third component, the phrases in the final document to the query. In this Evidence Assembler, to assimilate important ev- way, the reader accumulates knowledge about the idence identified in every reasoning chain of the query subject by exploring inter-connected docu- tree to make one final, unified prediction. To do ments, and eventually uncovers the entire reason- so, the Assembler selects key sentences from each ing chain that leads to the answer. Drawing inspi- root-to-leaf document path in the ‘reasoning tree’ ration from this coarse (document-level) plus fine- and forms a new condensed, salient context which grained (word-level) comprehension behavior, we is then bidirectionally-matched with the query rep- first construct a T -hop Document Explorer model, resentation to output the final prediction. Via this a hierarchical memory network, which at each re- procedure, evidence that was originally scattered current hop, selects one document to read, updates widely across several documents is now collected the memory cell, and iteratively selects the next concentratedly, hence transforming the task to a related document, overall constructing a reason- scenario where previous standard phrase-matching ing chain of the most relevant documents. We style QA models (Seo et al., 2017; Xiong et al., next introduce an Answer Proposer that performs 2017; Dhingra et al., 2017) can be effective. query-context reasoning at the word-level on the Overall, our 3-module, multi-hop, reasoning- retrieved chain and predicts an answer. Specifi- tree based EPAr (Explore-Propose-Assemble cally, it encodes the leaf document of the reason- reader) closely mimics the coarse-to-fine-grained ing chain while attending to its ancestral docu- reading and reasoning behavior of human readers. ments, and outputs ancestor-aware word represen- We jointly optimize this 3-module system by tations for this leaf document, which are compared having the following component working on to the query to propose a candidate answer. the outputs from the previous component and minimizing the sum of the losses from all 3 However, these two components above cannot modules. The Answer Proposer and Evidence handle questions that allow multiple possible rea- Assembler are trained with maximum likelihood soning chains that lead to different answers, as using ground-truth answers as labels, while the shown in Fig. 1b. After the Document Explorer Document Explorer is weakly supervised by selects the 1st document, it finds that both the 2nd heuristic reasoning chains constructed via TF-IDF and 3rd documents are connected to the 1st doc- and documents with the ground-truth answer. ument via entities “the Dyke Ditch” and “Upper On WikiHop, our system achieves the highest- Harz” respectively.