Information Seeking in the Spirit of Learning: a Dataset For
Total Page:16
File Type:pdf, Size:1020Kb
Information Seeking in the Spirit of Learning: A Dataset for Conversational Curiosity Pedro Rodriguez∗ Paul Crook Computer Science Facebook University of Maryland [email protected] [email protected] Seungwhan Moon Zhiguang Wang Facebook Facebook [email protected] [email protected] Abstract U: <assistant wake-word>, tell me about Tahiti. A: It’s the largest island in French Polynesia, near Open-ended human learning and information- the center of the Pacific seeking are increasingly mediated by digi- U: What is its history with France? tal assistants. However, such systems of- ten ignore the user’s pre-existing knowledge. Assuming a correlation between engagement Figure 1: An example of information-seeking dialog and user responses such as “liking” messages that the Curiosity dataset aims to support. Assistants or asking followup questions, we design a should answer user questions and convey information Wizard-of-Oz dialog task that tests the hypoth- that inspires meaningful followup questions. esis that engagement increases when users are presented with facts related to what they know. Through crowd-sourcing of this experiment, Theories of human learning, such as Vygotsky’s we collect and release 14K dialogs (181K ut- zone of proximal development, propose that learn- terances) where users and assistants converse ing novel information should be rooted in pre- about geographic topics like geopolitical enti- existing knowledge and skills of the learner (Chaik- ties and locations. This dataset is annotated lin, 2003). Considering this, a good policy may with pre-existing user knowledge, message- give general information about Tahiti; a better pol- level dialog acts, grounding to Wikipedia, and user reactions to messages. Responses us- icy would select information related to the user’s ing a user’s prior knowledge increase engage- knowledge (e.g., familiarity with France). We ment. We incorporate this knowledge into a hypothesize that engagement is correlated with multi-task model that reproduces human assis- policies that integrate a user’s pre-existing knowl- tant policies and improves over a BERT content edge, and test this through a large-scale, Wizard- model by 13 mean reciprocal rank points. of-Oz (WoZ) style collection (Kelley, 1984; Wen et al., 2017) that captures assistant policies, user re- 1 Introduction actions, and topically relevant entities that the user Conversational agents such as Alexa, Siri, and knows about. The Curiosity dataset has 14,048 En- Google Assistant should help users discover, learn, glish dialogs annotated with sentence-level knowl- and retain novel factual information. More gen- edge grounding, the user’s prior knowledge, dialog acts per message, and binary ratings per message.1 arXiv:2005.00172v2 [cs.CL] 10 Nov 2020 erally, systems for conversational information- seeking should help users develop their information In our dialog task (Figure2), one worker takes need, be mixed-initiative, incorporate user memory, the role of a curious user learning about a geo- and reason about the utility of retrieved informa- graphic entity and the other of a digital assistant tion as a combined set (Radlinski and Craswell, with access to Wikipedia facts (Section2). At the 2017). We focus on a curiosity-driven, information- start of each dialog, the user is assigned an en- seeking scenario where a user starts a conversation tity as their topic (e.g., Puerto Rico) along with with an assistant by asking an open-ended question two aspects (e.g., history and demographics) to in- and then drills down into interest areas (Figure1). vestigate. Beforehand, we show the user a list of In this setting, what policies should assistants entities related to the topic, and they mark which pursue to maintain the user’s interest in the topic? they know; these entities are a sample of their pre- ∗?Work done while interning at Facebook. 1Dataset and code at curiosity.pedro.ai. Topic: Puerto Rico Recurrent Encoder model (Serban et al., 2015) and Aspects: History, Demographics * show that our model improves over the baseline. In summary, we make three main contributions: User knows about ** Assistant sees Spaniards General Facts (1) we design an experiment to test the efficacy San Juan * Aspect Facts of personalizing conversational information sys- United States ** Rooted Facts tems through a user’s prior knowledge, (2) intro- duce the Curiosity dataset—the first dialog dataset (1) Quiz Completion (2) Relevant Facts from Wiki combining sentence-level knowledge groundings, User per message ratings, and per message dialog act Could you tell me about annotations, allowing for robust and fine-grained Puerto Rico’s history? structural learning of dialog policies for similar request_aspect Assistant applications, and (3) design a multi-task model that incorporates the user’s prior knowledge and It was a Spanish colony until improves over a natural BERT baseline. 1898 when the U.S. acquired it (4) Dialog Acts as part of the Treaty of Paris. & Like Labels liked inform_response 2 Building the Curiosity Dataset Annotation (3) Human-Human Role-playing Dialog Creation This section describes the construction of the Cu- Figure 2: We sample pre-existing knowledge by asking riosity dataset. Dialog topics consist of prominent users to indicate which topically related entities they world geographic entities. The worldwide spread already know. The assistant paraphrases facts related of entities makes each novel to most users, the to either known entities (rooted facts), an aspect (aspect consistent topic type makes starting dialogs easier, facts), or the topic generally (general facts). The user and their rich histories, demographics, and eco- expresses engagement through a like button. Dialog nomics add topical diversity. For example, most acts are annotated in a separate crowd-source task. people are only vaguely familiar with the history of existing knowledge. The user engages in open- Puerto Rico, but most know about related concepts ended discovery while the assistant simultaneously such as the United States or Hurricane Maria. Sec- answers the user’s questions and proactively intro- tion 2.1 describes how we select geographic topics, ducing facts likely to prompt followup questions. aspects, and derive a set of facts to ground against. Section3 uses dialog act annotations combined We collected the dataset in two steps: (1) collecting with explicit and implicit user feedback to compare dialogs with a custom interface (Section 2.2) and assistants’ content selection and presentation poli- (2) after-the-fact dialog act annotation (Section 2.3). cies. For example, in interactions where the user Sample dialogs from Curiosity are in AppendixC. asks a question and the assistant paraphrases a fact, how often does the user ask a followup question 2.1 Geographic Topics, Aspects, and Facts versus trail off in disinterest? Most datasets (Sec- We select 361 geographic pages from Wikipedia tion6) do not have enough annotations to answer that have separate geography and history pages these questions: it requires message-level dialog (e.g., Puerto Rico, Geography of Puerto Rico, and act annotations and feedback signals. We compare History of Puerto Rico).2 We use sentences from three assistant policies: using a fact with a rooted each page to build a set of 93,845 facts. We run an entity, a fact from the user’s aspect, or a generic entity linker over the content (Gupta et al., 2017) fact about the topic. The policies are compared and index each fact by its source page (topic), through user “likes” of assistant messages and by source section (aspect), and mentioned entities. Fi- the dialog act of their subsequent message (e.g., nally, we fit a TF-IDF text matcher (Rajaraman and did they ask a specific followup or change topic). Ullman, 2011) with Scikit-Learn (Pedregosa et al., In Section4, we design models that predict the 2011). While conversing, assistants are shown facts policies used by the assistant: what type of message filtered by topic, aspect, or mentioned entities, that to send and which fact to use (if any). All mod- are ranked by textual similarity to the dialog. els are trained jointly with a multi-task objective function. We compare an end-to-end BERT (Devlin 2The existence of these pages implies that the topic has et al., 2018) model to our task-specific Hierarchical ample historical and geographical knowledge to draw from. 2.2 User and Assistant Dialog Interfaces assigned aspects (aspect facts), and three from any- To collect dialogs, we build user and assistant inter- where on the page (general facts). By construction, faces for annotators. The user’s interface samples rooted facts overlap with the exclusive categories their prior knowledge of a topic, captures which of aspect and general facts. For each category, we assistant messages interest them, and manages the find the nine highest TF-IDF scoring facts and then dialog context. The assistant’s interface provides randomize their order. To avoid biasing the assis- contextually relevant facts. AppendixA has screen- tant, we do not inform them about the user’s known shots and details of each interface component. entities or distinguish between types of facts. Sampling User’s Prior Knowledge When de- 2.3 Dialog Act Annotation ployed, digital assistants can draw from prior inter- Inducing structure on conversations through dialog actions (Ram et al., 2018) to estimate what a user acts is helpful for analysis and downstream mod- knows. However, since we do not have these prior els (Tanaka et al., 2019). We introduce structure— interactions, we collect information about what beyond knowledge groundings—into Curiosity by users know. Instead of exhaustively asking about annotating dialog acts for each message. every entity related to the topic, we sample this In a separate collection, we annotate all ut- knowledge. Before the dialog begins, we show the terances with dialogs acts using a custom inter- user fifteen related entities that range from com- face (AppendixB).