ViGGO: A Video Game Corpus for Data-To-Text Generation in Open-Domain Conversation Juraj Juraska, Kevin K. Bowden and Marilyn Walker Natural Language and Dialogue Systems Lab University of California, Santa Cruz fjjuraska,kkbowden,
[email protected] Abstract give opinion(NAME [SpellForce 3], RATING [poor], GENRES [real-time strategy, role-playing], PLAY- The uptake of deep learning in natural lan- ER PERSPECTIVE [bird view]) guage generation (NLG) led to the release of I think that SpellForce 3 is one of the worst games I’ve both small and relatively large parallel corpora ever played. Trying to combine the real-time strategy for training neural models. The existing data- and role-playing genres just doesn’t work, and the bird’s to-text datasets are, however, aimed at task- eye view makes it near impossible to play. oriented dialogue systems, and often thus lim- verify attribute(NAME [Little Big Adventure], RAT- ited in diversity and versatility. They are typ- ING [average], HAS MULTIPLAYER [no], PLATFORMS ically crowdsourced, with much of the noise [PlayStation]) left in them. Moreover, current neural NLG I recall that you were not that fond of Little Big Ad- models do not take full advantage of large venture. Does single-player gaming on the PlayStation training data, and due to their strong gener- quickly get boring for you? alizing properties produce sentences that look template-like regardless. We therefore present Table 1: Examples of MRs and corresponding refer- a new corpus of 7K samples, which (1) is clean ence utterances in the ViGGO dataset. The DA of the despite being crowdsourced, (2) has utterances MRs is indicated in italics, and the slots in small caps.