Stay Hungry, Stay Focused: Generating Informative and Specific
Total Page:16
File Type:pdf, Size:1020Kb
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations Peng Qi Yuhao Zhang Christopher D. Manning Stanford University Stanford, CA 94305 fpengqi, yuhaozhang, [email protected] Abstract Background: Spandau Ballet (English band) Spandau Ballet were an English new wave band formed in Islington, London, in 1979. Inspired by … We investigate the problem of generating infor- Topic: 1983--1989: International success and decline mative questions in information-asymmetric Private What was the first indication of Spandau Knowledge Ballet's success at the international level? conversations. Unlike previous work on ques- The band released their tion generation which largely assumes knowl- third album … The The follow-up album, Parade, was released in June follow-up album, 1984, and its singles were again big successes in edge of what the answer might be, we are in- Parade, was released in the charts in Europe, Oceania and Canada. June 1984, and its terested in the scenario where the questioner is singles were again big successes in the charts BL What was the name of the album? not given the context from which answers are in Europe, Oceania and Canada. The album’s What was the most popular single from drawn, but must reason pragmatically about Ours opening song, “Only the album? how to acquire new information, given the When You Leave”, became the band’s last What were the notable songs from the Ref shared conversation history. We identify two American hit. … album Parade? core challenges: (1) formally defining the in- formativeness of potential questions, and (2) Figure 1: Asking questions in a conversation to ac- exploring the prohibitively large space of po- quire information. In this communication setting, the tential questions to find the good candidates. question asker has access to the background and topic, To generate pragmatic questions, we use rein- but no access to the private textual knowledge that forcement learning to optimize an informative- contains the answer. In this example, the baseline ness metric we propose, combined with a re- non-pragmatic question generator (BL) generates an ward function designed to promote more spe- uninformative question (one that has already been an- cific questions. We demonstrate that the re- swered), while our pragmatic system (Ours) and hu- sulting pragmatic questioner substantially im- mans (Ref) actively seek new information. proves the informativeness and specificity of questions generated over a baseline model, as evaluated by our metrics as well as humans. from (Du et al., 2017; Zhou et al., 2017). Despite their successful adaptation to conversations to pre- 1 Introduction dict the question that elicits the observed answer Conversations are a primary means to seek and (Gao et al., 2019; Pan et al., 2019; Nakanishi et al., communicate information between humans, where 2019), they are not suitable for modeling communi- asking the right question is an important prerequi- cation of knowledge in open-domain conversations, site for effective exchange of knowledge. Learning because the crucial problem of what to communi- arXiv:2004.14530v2 [cs.CL] 20 Oct 2020 to ask questions in conversations can help computer cate has already been assumed to be addressed by systems not only acquire new knowledge, but also conditioning on the schema of information need or engage human interlocutors by making them feel the context that contains the answer. heard (Huang et al., 2017). We instead study the problem of question gen- Previous work on question generation often falls eration in a more realistic setting, i.e., in open- into three classes: generating questions according domain information-seeking conversations where to a discrete schema or end goal (Bordes et al., the question asker cannot access the answering 2017; Zhang et al., 2018b), transforming the an- context. This is an important step towards prac- swer statement into a question (Mitkov et al., 2003; tical natural language processing (NLP) systems Rus et al., 2010; Heilman and Smith, 2010), or gen- that can reason about the state of mind of agents erating questions with data-driven systems by con- they interact with purely through natural language ditioning on the context where the answer comes interactions, so that they can generate more help- ful responses. In this paper, we build a question method for pragmatic reasoning in an open- generator that reasons pragmatically about what in- domain communication setting. formation the answerer can provide, and generates questions to gather new information in a conversa- 2 Related Work tion (see Figure1 for an example). Question Generation. Question generation has We identify several key challenges in this task: long been studied in the education and psychol- (1) generating informative questions without access ogy communities as a means to assess and pro- to potential answers; (2) evaluating generated ques- mote reading comprehension in humans (Davey tions beyond comparing them to the reference ques- and McBride, 1986). In natural language process- tion, because multiple questions can reveal unseen ing, question generation has been explored to im- information despite being very different to each prove the systems in various natural language pro- other; (3) navigating a large search space of poten- cessing tasks, e.g., the quality of question answer- tial questions to improve informativeness by rea- ing systems (Duan et al., 2017) as well as informa- soning about the other agent’s knowledge, which tion retrieval in an open-domain question answer- is more complex than limited reference games in ing system (Nogueira et al., 2019). previous work on computational pragmatics. Some of the first question generation systems are To address these issues, we first develop a base- rule-based (Mitkov et al., 2003; Rus et al., 2010; line question generation model that generates ques- Heilman and Smith, 2010), while large-scale ques- tions in a conversation without conditioning on the tion answering datasets, e.g., SQuAD (Rajpurkar unseen knowledge. We then propose automatic et al., 2016, 2018), have recently kindled research metrics to quantify how much new information interest in data-driven approaches. Du et al.(2017) questions reveal, as well as how specific they are to and Zhou et al.(2017) apply sequence-to-sequence the conversation. Next, we use reinforcement learn- (seq2seq) models to generate SQuAD questions ing to optimize our question generator on these from Wikipedia sentences containing the answers. metrics. In our experiments on the QuAC dataset, The release of large conversational question an- we show that the proposed method substantially swering datasets such as QuAC (Choi et al., 2018) improves the specificity and informativeness of the and CoQA (Reddy et al., 2019) enabled Gao et al. generated questions as evaluated by our automatic (2019), Pan et al.(2019) and Nakanishi et al.(2019) metrics. These results are corroborated by blinded to extend previous neural seq2seq question genera- human evaluation, where questions generated by tors by conditioning them on the conversation his- our system are also of higher overall quality than tory and the context that contains the answer, while those by the baseline system as judged by humans. Scialom and Staiano(2019) remove answers to 1 To recap, our main contributions are: the reference question to generate curiosity-driven questions from the rest of the context. • To the best of our knowledge, our work repre- Despite their success, most existing approaches sents the first attempt at studying question gen- to question generation are limited to either reading eration to seek information in open-domain comprehension settings where potential answers communication, which involves challenging are known a priori, or goal-oriented settings where NLP problems, e.g., evaluation of open-ended the schema of knowledge is limited (Bordes et al., language generation and pragmatic reasoning; 2017; Zhang et al., 2018b). This prevents them • To address these problems, we propose auto- from being applied to an open-domain communica- matic metrics to quantify the informativeness tion setting, where the purpose of questions is to and specificity of questions, which are essen- acquire information that is unknown ahead of time. tial for efficient iterative system development; Evaluating System-generated Questions. Au- • We show that optimizing the proposed metrics tomatic evaluation of system-generated text has via reinforcement learning leads to a system long been an important topic in NLP. Traditional n- that behaves pragmatically and has improved gram overlap-based approaches (Papineni et al., communication efficiency, as also verified by 2002; Lin, 2004) are computationally efficient, human evaluation. This represents a practical but have been shown to correlate poorly with hu- 1We release our code and models at https://github. man judgement of quality (Novikova et al., 2017). com/qipeng/stay-hungry-stay-focused. More recently, Zhang et al.(2020) leverage large pretrained language models (BERT, Devlin et al., of discussion T (Background and Topic in the fig- 2019) to relax the limitation of exact n-gram over- ure), as well as a common goal for the student to lap. Hashimoto et al.(2019) combine human judge- acquire some knowledge K on this topic that only ment with system-reported likelihood of generated the teacher has direct access to (Private Knowledge text to make population-level estimates of quality in the figure). We consider the scenario where the and diversity. However, most existing metrics ei- agents can only communicate to each other by en- ther evaluate generated text against very few refer- gaging in a conversation, where the conversation ences, or provide only relative ranking for multiple history H is shared between the agents. We fur- systems at a population level rather than reliable ther constrain the conversation to one where the feedback for each example. This renders them in- student asks questions about the shared topic, and applicable to generating informative questions in the teacher provides answers based on K.