AMBIGQA: Answering Ambiguous Open-domain Questions Sewon Min,1,2 Julian Michael,1 Hannaneh Hajishirzi,1,3 Luke Zettlemoyer1,2 1University of Washington 2Facebook AI Research 3Allen Institute for Artificial Intelligence fsewon,julianjm,hannaneh,
[email protected] Abstract Ambiguity is inherent to open-domain ques- tion answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer. In this paper, we introduce AMBIGQA, a new open-domain question answering task which involves finding every plausible answer, and then rewriting the question for each one to re- solve the ambiguity. To study this task, we con- struct AMBIGNQ, a dataset covering 14,042 questions from NQ-OPEN, an existing open- domain QA benchmark. We find that over half of the questions in NQ-OPEN are ambigu- ous, with diverse sources of ambiguity such as event and entity references. We also present strong baseline models for AMBIGQA which Figure 1: An AMBIGNQ example where the prompt we show benefit from weakly supervised learn- question (top) appears to have a single clear answer, but ing that incorporates NQ-OPEN, strongly sug- is actually ambiguous upon reading Wikipedia. AM- gesting our new task and data will support sig- BIGQA requires producing the full set of acceptable nificant future research effort. Our data and answers while differentiating them from each other us- baselines are available at https://nlp.cs. ing disambiguated rewrites of the question. washington.edu/ambigqa. 1 Introduction shown in Figure1, ambiguity is a function of both In the open-domain setting, it can be difficult the question and the evidence provided by a large to formulate clear and unambiguous questions.