Viggo: a Video Game Corpus for Data-To-Text Generation in Open-Domain Conversation
Total Page:16
File Type:pdf, Size:1020Kb
ViGGO: A Video Game Corpus for Data-To-Text Generation in Open-Domain Conversation Juraj Juraska, Kevin K. Bowden and Marilyn Walker Natural Language and Dialogue Systems Lab University of California, Santa Cruz fjjuraska,kkbowden,[email protected] Abstract give opinion(NAME [SpellForce 3], RATING [poor], GENRES [real-time strategy, role-playing], PLAY- The uptake of deep learning in natural lan- ER PERSPECTIVE [bird view]) guage generation (NLG) led to the release of I think that SpellForce 3 is one of the worst games I’ve both small and relatively large parallel corpora ever played. Trying to combine the real-time strategy for training neural models. The existing data- and role-playing genres just doesn’t work, and the bird’s to-text datasets are, however, aimed at task- eye view makes it near impossible to play. oriented dialogue systems, and often thus lim- verify attribute(NAME [Little Big Adventure], RAT- ited in diversity and versatility. They are typ- ING [average], HAS MULTIPLAYER [no], PLATFORMS ically crowdsourced, with much of the noise [PlayStation]) left in them. Moreover, current neural NLG I recall that you were not that fond of Little Big Ad- models do not take full advantage of large venture. Does single-player gaming on the PlayStation training data, and due to their strong gener- quickly get boring for you? alizing properties produce sentences that look template-like regardless. We therefore present Table 1: Examples of MRs and corresponding refer- a new corpus of 7K samples, which (1) is clean ence utterances in the ViGGO dataset. The DA of the despite being crowdsourced, (2) has utterances MRs is indicated in italics, and the slots in small caps. of 9 generalizable and conversational dialogue The slot mentions in the utterances are bolded. act types, making it more suitable for open- domain dialogue systems, and (3) explores the domain of video games, which is new to dia- Motivated by the rising interest in open-domain logue systems despite having excellent poten- dialogue systems and conversational agents, we tial for supporting rich conversations. present ViGGO – a smaller but more comprehen- sive dataset in the video game domain, introducing 1 Introduction several generalizable dialogue acts (DAs), mak- The recent adoption of deep learning methods in ing it more suitable for training versatile and more natural language generation (NLG) for dialogue conversational NLG models.1 The dataset pro- systems resulted in an explosion of neural data- vides almost 7K pairs of structured meaning repre- to-text generation models, which depend on large sentations (MRs) and crowdsourced reference ut- training data. These are typically trained on one terances about more than 100 video games. Ta- of the few parallel corpora publicly available, in ble1 lists three examples. particular the E2E (Novikova et al., 2017) and the Video games are a vast entertainment topic that WebNLG (Gardent et al., 2017) datasets. Crowd- can naturally be discussed in a casual conversa- sourcing large NLG datasets tends to be a costly tion, similar to movies and music, yet in the dia- and time-consuming process, making it impracti- logue systems community it does not enjoy pop- cal outside of task-oriented dialogue systems. At ularity anywhere close to that of the latter two the same time, current neural NLG models strug- topics (Fazel-Zarandi et al., 2017; Li et al., 2017; gle to replicate the high language diversity of the Moghe et al., 2018; Shah et al., 2018; Khatri et al., training sentences present in these large datasets, 2018). Restaurants have served as the go-to topic and instead they learn to produce the same generic in data-to-text NLG for decades, as they offer a type of sentences as with considerably less train- sufficiently large set of various attributes and cor- ing data (Deriu and Cieliebak, 2018; Juraska and 1The ViGGO corpus is available for download at: Walker, 2018; Dusekˇ et al., 2019). https://nlds.soe.ucsc.edu/viggo 164 Proceedings of The 12th International Conference on Natural Language Generation, pages 164–172, Tokyo, Japan, 28 Oct - 1 Nov, 2019. c 2019 Association for Computational Linguistics Slot Mandatory Additional responding values to talk about. While they cer- DA range slots common slots tainly can be a topic of a casual conversation, NAME, inform 3-8 RELEASE YEAR, the existing restaurant datasets (Stent et al., 2004; GENRES DEVELOPER, confirm 2-3 NAME Gasiˇ c´ et al., 2008; Mairesse et al., 2010; Howcroft ESRB, GENRES, give opin- NAME, 3-4 PLAYER PERSPEC- et al., 2013; Wen et al., 2015a; Nayak et al., 2017) ion RATING TIVE, recommend 2-3 NAME are geared more toward a task-oriented dialogue HAS MULTI- request 1-2 SPECIFIER where a system tries to narrow down a restaurant PLAYER, request at- 1 PLATFORMS, based on the user’s preferences and ultimately give tribute AVAIL- request ex- a recommendation. Our new video game dataset 2-3 RATING ABLE ON STEAM, planation is designed to be more conversational, and to thus HAS LINUX RE- suggest 2-3 NAME LEASE, enable neural models to produce utterances more verify at- NAME, 3-4 HAS MAC RELEASE suitable for an open-domain dialogue system. tribute RATING Even the most recent addition to the publicly Table 2: Overview of mandatory and common possible available restaurant datasets for data-to-text NLG, slots for each DA in the ViGGO dataset. There is an the E2E dataset (Novikova et al., 2017), suffers additional slot, EXP RELEASE DATE, only possible in from the lack of a conversational aspect. It has be- the inform and confirm DAs. Moreover, RATING is also come popular, thanks to its unprecedented size and possible in the inform DA, though not mandatory. multiple reference utterances per MR, for training end-to-end neural models, yet it only provides a single DA type. In contrast with the E2E dataset, ViGGO presents utterances of 9 different DAs. Other domains have been represented by task- oriented datasets with multiple DA types, for ex- ample the Hotel, Laptop, and TV datasets (Wen et al., 2015b, 2016). Nevertheless, the DAs in these datasets vary greatly in complexity, and their Figure 1: Distribution of the number of slots across all distribution is thus heavily skewed, typically with types of MRs, as well as the inform slot separately, and two or three similar DAs comprising almost the non-inform slots only. entire dataset. In our video game dataset, we omit- ted simple DAs, in particular those that do not re- for the generated MRs were then crowdsourced quire any slots, such as greetings or short prompts, using vetted workers on the Amazon Mechanical and focused on a set of substantial DAs only. Turk (MTurk) platform (Buhrmester et al., 2011), The main contribution of our work is thus a new resulting in 6,900 MR-utterance pairs altogether. parallel data-to-text NLG corpus that (1) is more With the goal of creating a clean, high-quality conversational, rather than information seeking or dataset, we strived to obtain reference utterances question answering, and thus more suitable for with correct mentions of all slots in the corre- an open-domain dialogue system, (2) represents a sponding MR through post-processing. new, unexplored domain which, however, has ex- 2.1 Meaning Representations cellent potential for application in conversational agents, and (3) has high-quality, manually cleaned The MRs in the ViGGO dataset range from 1 to 8 human-produced utterances. slot-value pairs, and the slots come from a set of 14 different video game attributes. Table2 details 2 The ViGGO Dataset how these slots may be distributed across the 9 dif- ferent DAs. The inform DA, represented by 3,000 ViGGO features more than 100 different video samples, is the most prevalent one, as the average game titles, whose attributes were harvested us- number of slots it contains is significantly higher ing free API access to two of the largest online than that of all the other DAs. Figure1 visual- 2 3 video game databases: IGDB and GiantBomb . izes the MR length distribution across the entire Using these attributes, we generated a set of 2,300 dataset. structured MRs. The human reference utterances The slots can be classified into 5 general cat- 2https://www.igdb.com/ egories covering most types of information MRs 3https://www.giantbomb.com/ typically convey in data-to-text generation scenar- 165 ios: Boolean, Numeric, Scalar, Categorical, and List. The first 4 categories are common in other NLG datasets, such as E2E, Laptop, TV, and Ho- tel, while the List slots are unique to ViGGO. List slots have values which may comprise multiple items from a discrete list of possible items. 2.2 Utterances With neural language generation in mind, we Figure 2: Distribution of the DAs across the train/val- crowdsourced 3 reference utterances for each MR idation/test split. For each partition the total count of so as to provide the models with the information DAs/MRs is indicated. about how the same content can be realized in multiple different ways. At the same time, this al- lows for a more reliable automatic evaluation by slots, there were no common MRs between the comparing the generated utterances with a set of train set and either of the validation or test set. We different references each, covering a broader spec- maintained a similar MR length and slot distribu- trum of correct ways of expressing the content tion across the three partitions. The distribution of given by the MR. The raw data, however, contains DA types, on the other hand, is skewed slightly to- a significant amount of noise, as is inevitable when ward fewer inform DA instances and a higher pro- crowdsourcing.