Frames: a Corpus for Adding Memory to Goal-Oriented Dialogue Systems

Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems Layla El Asri and Hannes Schulz and Shikhar Sharma and Jeremie Zumer Justin Harris and Emery Fine and Rahul Mehrotra and Kaheer Suleman Microsoft Maluuba [email protected] Abstract ries of datasets and tasks of increasing complexity were released. These shared tasks were essential to This paper proposes a new dataset, Frames, advance the state of the art on state tracking. Other composed of 1369 human-human dia- resources have allowed to study and develop differ- logues with an average of 15 turns per dia- ent approaches to spoken language understanding logue. This corpus contains goal-oriented and entity extraction (Mesnil et al., 2013). As for dialogues between users who are given dialogue management, simulators have been pro- some constraints to book a trip and assis- posed (Schatzmann et al., 2006) but datasets are tants who search a database to find appro- scarce. priate trips. The users exhibit complex In most datasets collected with an existing sys- decision-making behaviour which involve tem, the dialogues consist of sequential slot-filling: comparing trips, exploring different op- the system requests constraints until it can query tions, and selecting among the trips that the database and return several results to the user. were discussed during the dialogue. To Then, the user can ask for more information about drive research on dialogue systems towards a given result or request other possibilities. As handling such behaviour, we have anno- a consequence, the tasks and methods that were tated and released the dataset and we pro- based on these datasets were defined according to pose in this paper a task called frame track- this sequential slot-filling process ing. This task consists of keeping track of We propose the Frames dataset to study more different semantic frames throughout each complex dialogue flows and decision-making be- dialogue. We propose a rule-based baseline haviour. Our motivation comes from user studies in and analyse the frame tracking task through e-commerce which show that several information- this baseline. seeking behaviours are exhibited by users who may come with a very well defined item in mind, but 1 Introduction may also visit an e-commerce website with the Goal-oriented, information-retrieving dialogue sys- intent to compare items and explore different pos- tems have been designed traditionally to help users sibilities (Moe and Fader, 2001; Saha et al., 2017). find items in a database given a set of constraints Supporting this kind of decision-making process (Singh et al., 2002; Raux et al., 2003; El Asri in conversational systems implies adding memory. et al., 2014; Laroche et al., 2011). For instance, Memory is necessary to track different items or the LET’S GO dialogue system finds a bus sched- preferences set by the user during the dialogue. For ule given a bus number and a location (Raux et al., instance, consider product comparisons. If a user 2003). wants to compare different items using a dialogue Available resources for data-driven learning of system, then this system should be able to sepa- such goal-oriented systems are often collected with rately recall properties pertaining to each item. an existing system (Henderson et al., 2014b; Ben- We collected 1369 human-human dialogues in nett and Rudnicky, 2002) and have been proposed a Wizard-of-Oz (WOz) setting – i.e., users were to study one component of dialogue. Examples are paired up with humans, whom we refer to as wiz- the first three Dialogue State Tracking Challenges ards, who assumed the role of the dialogue system. (DSTC, Williams et al., 2016) during which a se- Wizards were given access to a database of vaca- 207 Proceedings of the SIGDIAL 2017 Conference, pages 207–219, Saarbrucken,¨ Germany, 15-17 August 2017. c 2017 Association for Computational Linguistics tion packages containing round-trip flights and a 2.2 Task Templates and Instructions hotel. Users were tasked with finding packages User-wizard dialogues took place on Slack.2 We based on a few constraints such as a destination deployed a Slack bot to pair up participants and and a budget. The dataset has been fully annotated record conversations. At the beginning of each 1 by human experts and is publicly available . dialogue, a user was paired with a wizard and given Along with this dataset, we formalize a new task a new task. Tasks were built from templates like called frame tracking. Frame tracking is an exten- the following: sion of state tracking (Henderson, 2015; Williams et al., 2016). In state tracking, the information sum- “Find a vacation between marizing the full dialogue history is compressed [START DATE] and [END DATE] into a single semantic frame which contains prop- for [NUM ADULTS] adults and erties and values corresponding to the user’s pref- [NUM CHILDREN] kids. You leave erences (e.g., destination city). In frame tracking, from [ORIGIN CITY]. You are travel- the dialogue agent must simultaneously track mul- ling on a budget and you would like to tiple semantic frames (e.g., different destination spend at most $[BUDGET].” cities; frames are defined formally in Section 4.2) Tasks were generated by drawing values (e.g., for throughout the conversation. BUDGET) from a database. We constructed our 2 Data Collection database of flight and hotel properties by hand to simulate what one would find on a standard travel We collected the Frames data over a period of 20 booking site. Each template was assigned a proba- days with 12 participants, who worked either for bility of success, and then constraint values were one day, one week, or 20 days. The participants drawn in order to comply with this probability. For alternated between the user and wizard roles on a example, if 20 tasks were generated at probability daily basis. Due to this rotation, we can assume 0.5, about 10 tasks would be generated with suc- that we deal with returning users who know how to cessful database queries and the other 10 would use the system, and focus on the decision making be generated such that the database returned no process, skipping the phase where the user learns results for the constraints. This success mech- about the system capabilities. The domain for all anism allowed us to emulate cases when a user dialogues is travel: specifically, finding a vacation would find nothing meeting her constraints. If a package that fulfils certain a priori requirements task was unsuccessful, the user either ended the through a conversational search-and-compare pro- dialogue or got an alternative task such as: “If cess. nothing matches your constraints, try increasing your budget by $200.” We wrote 38 templates. 14 2.1 Wizard-Of-Oz Setting were generic like the one presented above and the Wizard-of-Oz (WOz) dialogues (Kelley, 1984; other 24 included a background story to encourage Rieser et al., 2005; Wen et al., 2016) have the role-playing from users and to keep them engaged. considerable advantage of exhibiting realistic be- These templates were meant to add variety to the haviours often beyond the capabilities of existing dialogues. The generic templates were also im- dialogue systems. Our setting is slightly differ- portant for the users to create their own character ent from the usual WOz setting because, in our and personality. We found that the combination of case, users did not believe they were interacting the two types of templates prevented the task from with a dialogue system; they knew they were con- becoming too repetitive. Notably, we distributed versing with fellow humans. We chose not to the role-playing templates throughout the data col- give templated answers to wizards because, apart lection process to bring some novelty and surprise. from studying decision-making, we also wanted to We also asked the participants to write templates study information presentation and dialogue man- (13 of them) to keep them engaged in the task. agement. We work with text-based dialogues be- To control data collection, we gave a set of in- cause this engenders a more controlled wizard be- structions to the participants. The user instructions haviour, obviates handling time-sensitive turn tak- encouraged a variety of behaviours. As for the ing, and speech recognition noise. wizards, they were asked only to talk about the 1datasets.maluuba.com/Frames 2www.slack.com 208 database results and the task at hand. We also Figure 1c shows the distribution of user ratings. asked the wizards to perform untimely actions oc- More than 70% of the dialogues have the maximum casionally, for instance, to ask for information that rating of 5. Figure2 shows the occurrences of the user has already provided. It is interesting from dialogue acts in the corpus. The dialogue acts are a dialogue management point of view to have ex- described in Table9. We present the annotation amples of bad behaviour and of how it impacts user scheme in the following section. satisfaction. At the end of each dialogue, the user provided a wizard cooperativity rating on a scale of 4 Annotation 1 to 5. The wizard, on the other hand, was shown We manually annotated the Frames dataset with the user’s task and was asked whether she thought dialogue acts, slot types and values, references to the user had accomplished it. other frames, and the ID of the currently active frame for each utterance. We also computed frame 2.3 Search Interface And Suggestions descriptions based on the labels of earlier turns. Wizards received a link to a search interface every time a user was connected to them. The search 4.1 Dialogue Acts, Slot Types, Slot Values interface was a simple GUI with all the searchable Most of the dialogue acts used for annotation fields in the database (see AppendixA).

Frames: a Corpus for Adding Memory to Goal-Oriented Dialogue Systems

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support