Unsupervised Question Decomposition for Question Answering

Unsupervised Question Decomposition for Question Answering Ethan Perez 1 2 Patrick Lewis 1 3 Wen-tau Yih 1 Kyunghyun Cho 1 2 4 Douwe Kiela 1 Abstract A journalist We aim to improve question answering (QA) Multi-Hop QA Model by decomposing hard questions into easier sub- questions that existing QA systems can answer. Henry Louis Mencken (1880 – Albert Camus (7 November 1956) was an American 1913 – 4 January 1960) was a Since collecting labeled decompositions is cum- A1 journalist, critic and scholar French philosopher, author, A2 bersome, we propose an unsupervised approach of American English. and journalist. to produce sub-questions. Specifically, by lever- Single Hop Single Hop aging >10M questions from Common Crawl, we QA Model QA Model learn to map from the distribution of multi-hop Passage What profession does H. L. Who was questions to the distribution of single-hop sub- SQ1 Mencken have? Albert Camus? SQ2 questions. We answer sub-questions with an off- Unsupervised Decomp. the-shelf QA model and incorporate the resulting Model answers in a downstream, multi-hop QA system. Q What profession do H. L. Mencken and Albert Camus have in common? On a popular multi-hop QA dataset, HOTPOTQA, we show large improvements over a strong base- Figure 1. Overview: Using unsupervised learning, we decompose a line, especially on adversarial and out-of-domain multi-hop question into single-hop sub-questions, whose predicted questions. Our method is generally applicable and answers are given to a downstream question answering model. automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on Albert Camus have in common?” when given the answers hand-engineering and annotation. to the sub-questions “What profession does H. L. Mencken have?” and “Who was Albert Camus?” 1. Introduction Prior work in learning to decompose questions into sub- questions has relied on extractive heuristics, which gener- Question answering (QA) systems have become remarkably alizes poorly to different domains and question types, and good at answering simple, single-hop questions but still requires human annotation (Talmor & Berant, 2018; Min struggle with compositional, multi-hop questions (Yang et al., 2019b). In order to scale to any arbitrary question, et al., 2018; Hudson & Manning, 2019). In this work, we we would require sophisticated natural language generation examine if we can answer hard questions by leveraging capabilities, which often relies on large quantities of high- our ability to answer simple questions. Specifically, we quality supervised data. Instead, we find that it is possible approach QA by breaking a hard question into a series of to learn to decompose questions without supervision. sub-questions that can be answered by a simple, single-hop QA system. The system’s answers can then be given as input Specifically, we learn to map from the distribution of hard to a downstream QA system to answer the hard question, questions to the distribution of simpler questions. First, as shown in Fig.1. Our approach thus answers the hard we automatically construct a noisy, “pseudo-decomposition” question in multiple, smaller steps, which can be easier than for each hard question by retrieving relevant sub-question answering the hard question all at once. For example, it may candidates based on their similarity to the given hard ques- be easier to answer “What profession do H. L. Mencken and tion. We retrieve candidates from a corpus of 10M simple questions that we extracted from Common Crawl. Second, 1Facebook AI Research 2New York University 3University we train neural text generation models on that data with College London 4CIFAR Azrieli Global Scholar. Correspondence (1) standard sequence-to-sequence learning and (2) unsu- to: Ethan Perez <[email protected]>. pervised sequence-to-sequence learning. The latter has the Proceedings of the 37 th International Conference on Machine advantage that it can go beyond the noisy pairing between Learning, Vienna, Austria, PMLR 108, 2020. Copyright 2020 by questions and pseudo-decompositions. Fig.2 overviews our the author(s). decomposition approach. Unsupervised Question Decomposition for Question Answering ? ? Hard Question aim to leverage a QA model that is accurate on simple ? Simple Question ? questions to answer hard questions, without using super- ? vised question decompositions. Here, we consider sim- ? ple questions to be “single-hop” questions that require rea- Step 1 ? Pseudo ? soning over one paragraph or piece of evidence, and we Decomp. consider hard questions to be “multi-hop.” Our aim is ? Step 2 ? ? then to train a multi-hop QA model M to provide the cor- ? USeq2Seq ? ? ? rect answer a to a multi-hop question q about a given a ? ? ? context c (e.g., several paragraphs). Normally, we would ? ? ? or train M to maximize log pM (ajc; q). To help M, we lever- ? Step 2 ? ? ? ? ? age a single-hop QA model that may be queried with sub- Seq2Seq questions s1; : : : ; sN , whose “sub-answers” to each sub- question a1; : : : ; aN may be provided to the multi-hop QA Figure 2. Unsupervised Decomposition: Step 1: We create a model. M may then instead maximize the (potentially eas- corpus of pseudo-decompositions D by finding candidate sub- ier) objective log p (ajc; q; [s ; a ];:::; [a ; s ]). questions from a simple question corpus S which are similar to M 1 1 N N a multi-hop question in Q. Step 2: We learn to map multi-hop Supervised decomposition models learn to map each ques- questions to decompositions using Q and D as training data, via tion q 2 Q to a decomposition d = [s1; ::: ; sN ] of N sub- either standard or unsupervised sequence-to-sequence learning. questions sn 2 S using annotated (q; d) examples. In this work, we do not assume access to strong (q; d) supervision. We use decompositions to improve multi-hop QA. We first To leverage the single-hop QA model without supervision, use an off-the-shelf single-hop QA model to answer decom- we follow a three-stage approach: 1) map a question q into posed sub-questions. We then give each sub-question and sub-questions s1; : : : ; sN via unsupervised techniques, 2) its answer as additional input to a multi-hop QA model. find sub-answers a1; : : : ; aN with the single-hop QA model, and 3) provide s ; : : : ; s and a ; : : : ; a to help predict a. We test our method on HOTPOTQA (Yang et al., 2018), a 1 N 1 N popular multi-hop QA benchmark. 2.1. Unsupervised Question Decomposition Our contributions are as follows. First, QA models relying on decompositions improve accuracy over a strong baseline To train a decomposition model, we need appropriate train- by 3.1 F1 on the original dev set, 11 F1 on the multi-hop ing data. We assume access to a hard question corpus Q dev set from Jiang & Bansal(2019a), and 10 F1 on the and a simple question corpus S. Instead of using super- out-of-domain dev set from Min et al.(2019b). Our most vised (q; d) training examples, we design an algorithm that 0 0 effective decomposition model is a 12-block transformer constructs pseudo-decompositions d to form (q; d ) pairs encoder-decoder (Vaswani et al., 2017) trained using unsu- from Q and S using an unsupervised approach (x2.1.1). pervised sequence-to-sequence learning, involving masked We then train a model to map q to a decomposition. We ex- language modeling, denoising, and back-translation objec- plore learning to decompose with standard and unsupervised tives (Lample & Conneau, 2019). Second, our method is sequence-to-sequence learning (x2.1.2). competitive with state-of-the-art methods SAE (Tu et al., 2020) and HGN (Fang et al., 2019) which leverage strong su- 2.1.1. CREATING PSEUDO-DECOMPOSITIONS pervision. Third, we show that our approach automatically For each q 2 Q, we construct a pseudo-decomposition set learns to generate useful decompositions for all 4 question 0 d = fs1; ::: ; sN g by retrieving simple question s from S. types in HOTPOTQA, highlighting the general nature of our We concatenate all N simple questions in d0 to form the approach. In our analysis, we explore how sub-questions pseudo-decomposition used downstream. N may be chosen improve multi-hop QA, and we provide qualitative exam- based on the task or vary based on q. To retrieve useful sim- ples that highlight how question decomposition adds a form ple questions for answering q, we face a joint optimization of interpretability to black-box QA models. Our ablations problem. We want sub-questions that are both (i) similar to show that each component of our pipeline contributes to QA q according to some metric f and (ii) maximally diverse: performance. Overall, we find that it is possible to success- fully decompose questions without any supervision and that doing so improves QA. 2. Method 0∗ X X d = argmax f(q; si) − f(si; sj) (1) 0 We now formulate the problem and overview our high- d ⊂S 0 0 si2d si;sj 2d level approach, with details in the following section. We si6=sj Unsupervised Question Decomposition for Question Answering 2.1.2. LEARNING TO DECOMPOSE initial single-hop, simple question corpus S. However, our pseudo-decomposition corpus should be large, as the cor- Having now retrieved relevant pseudo-decompositions, we pus will be used to train neural Seq2Seq models, which examine different ways to learn to decompose (with imple- are data hungry. A larger jSj will also improve the rele- mentation details in the following section): vance of retrieved simple questions to the hard question. Thus, we take inspiration from work in machine transla- No Learning We use pseudo-decompositions directly, tion on parallel corpus mining (Xu & Koehn, 2017; Artetxe employing retrieved sub-questions in downstream QA.

Load more