What We Talk About When We Talk About Games: Bottom-Up Game Studies Using Natural Language Processing

What We Talk About When We Talk About Games: Bottom-Up Game Studies Using Natural Language Processing James Owen Ryan1;2, Eric Kaltman1, Michael Mateas1, and Noah Wardrip-Fruin1 1 Expressive Intelligence Studio 2 Natural Language and Dialogue Systems Lab University of California, Santa Cruz {jor, ekaltman, nwf, michaelm}@soe.ucsc.edu ABSTRACT for analysis. What insights could we glean, from these tril- In this paper, we endorse and advance an emerging bottom- lions of words, about videogames as a medium? If we were to up approach to game studies that utilizes techniques from harness the sheer volume of all this language, what could we natural language processing. Our contribution is threefold: build? We believe that the application of natural language we present the first complete review of the growing body of processing (NLP) techniques to text about games represents work through which this methodology has been innovated; a hugely promising but relatively unexplored area of game- we present a latent semantic analysis model that constitutes studies research. This approach is bottom-up in the sense the first application of this fundamental bottom-up tech- that it yields findings that emerge (often unexpectedly) from nique to the domain of digital games; and finally, unlike extensive language use, whereas in the more conventional earlier projects that have only written about their models, top-down approach a scholar starts from a preconceived no- we introduce and evaluate a tool that serves as an interface tion that she then attempts to substantiate. Bottom-up to ours. This tool is GameNet, in which nearly 12,000 games methods are particularly useful in exploratory research, and are linked to the games to which they are most related. From for many topics they also represent a more grounded ap- an expert evaluation, we demonstrate that, beyond being an proach. In game studies, there are numerous subjects of interface to our model, GameNet may be used more gener- interest, including games themselves, that cannot be logi- ally as a research tool for game scholars. Specifically, we cally organized into neatly discrete categories. As a result, find that it is especially useful for the scholar who wishes top-down approaches to these subjects tend to organize their to explore a relatively unfamiliar area of games, but that research and discussions around concepts like game genre, it may also be used to discover unforeseen cases related to which is a problematic borrowing from game-industry mar- topics that have already been thoroughly researched. keting practice. These contrived notions restrict not only innovation of the form, but also our discussion of it. We might also consider how other disciplines have had Categories and Subject Descriptors great insights using similar bottom-up methods. In the dig- I.5.3 [Clustering]: Similarity Measures; K.8.0 [General]: ital humanities, there has been in the last decade an out- Games pouring of work applying NLP techniques to various text collections, to great results. As an example, the authors of General Terms [52] processed over 3,000 library-science dissertations, dating from 1930 onward, to generate an overview of the field that Algorithms, Design, Measurement, Human Factors shed new light on its origins, its history, and its outlook. We wonder what we could learn about our (albeit much Keywords younger) field from a study applying this very method to a game studies, literature review, natural language processing, collection of game-studies dissertations. In [35], 80,000 arti- research tools, machine learning, latent semantic analysis cles and advertisements from a colonial U.S. newspaper were processed in an effort that afforded better understanding of 1. INTRODUCTION early American society. This method applied to a corpus of early videogame magazines could be equally illuminating. There is a nearly boundless accumulation of text about The toolset for this methodology is well-developed and there digital games that exists online in structured collections ripe is certainly no shortage of text about games|the outstand- ing matter is doing the work. While this area of game-studies research is vastly unexplored, there are a handful of scholars that have begun to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are venture into it. As part of our contribution here, we pro- not made or distributed for profit or commercial advantage and that vide in this paper the first complete survey of this growing copies bear this notice and the full citation on the first page. literature. But while this foundational work has been suc- Proceedings of the 10th International Conference on the Foundations of cessful in producing findings that top-down methods could Digital Games (FDG 2015), June 22-25, 2015, Pacific Grove, CA, USA. not have, there is a recurrent issue that has inhibited the ISBN 978-0-9913982-4-9. Copyright held by author(s). emerging methodology. None of the models that have been cluster the adjectives. Clustering is a procedure whereby developed so far can be engaged beyond the prose of their objects are grouped together such that ones in the same respective publications, and this is highly problematic. Due cluster are more similar to one another (with regard to their to the inherent complexity of such models, it is difficult to feature representations) than to objects in other clusters. adequately describe them, or even to give a sense of their For this task, the authors use k-means [27], one of the stan- general implications, with prose alone. It is our belief that dard clustering algorithms. Here, k is a hyperparameter|a bottom-up models produced by NLP techniques must be vi- parameter whose value is set by the user prior to runtime sualized or made interactive in some way. (as opposed to a parameter whose value is `learned' by the From this impetus, we present (and evaluate) GameNet, algorithm itself during runtime)|that specifies how many the first such interactive visualization of a game-studies NLP clusters the algorithm will partition the input set of objects model. As a tool in which nearly 12,000 games are linked to into. After some initial exploration, the authors set k to one another according to how related they each are, GameNet 30, and then use a subset of these 30 adjective clusters to could not have been built using top-down methods. Under- propose a typology of what they call \primary elements of pinning it is a model that was developed by processing a gameplay aesthetics" (12). Using this, they attest the ex- collection of Wikipedia articles about games, totaling some istence of a rich language for appraising aesthetic aspects 14.5M words, using an NLP technique called latent seman- of gameplay, but note that the specific vocabulary used by tic analysis (LSA). In addition to our literature review and players appears to be different from that employed by schol- GameNet, the third facet to our contribution here is the ars and designers. first application of LSA, one of the foundational bottom-up We note that this particular finding would not have been NLP techniques, to the domain of digital games. Above all, reached using a top-down approach. The primary elements we hope that this project will spur new and interesting re- that they present are rooted in specific language attested search using the bottom-up approach to game studies that across thousands of game reviews, not in preconceived no- we describe and practice herein. tions that the authors set out to test. In fact, they empha- In the following section, we provide a review of prior game- size their surprise that the concept of èmergence' did not studies work that has used NLP, as well as a detailed de- appear as a primary element. This highlights a fundamen- scription of latent semantic analysis (both of which assume tal appeal of bottom-up scholarship, which is that it often an audience that is new to NLP). In Section 3, we recount yields unexpected insights. the extraction and preparation of our collection of thousands [58] is a journal article in which Zagal, Tomuro, and Shep- of Wikipedia articles describing individual games, as well as itsen argue for the use of NLP in game-studies research us- our derivation of a latent semantic analysis model trained ing three example studies. Here, we outline only the first of on that collection. GameNet and its expert evaluation are these, as we present the others elsewhere. In this brief study, discussed in Section 4. Finally, we conclude in Section 5. the authors apply readability metrics to 1500 professionally written game reviews extracted from GameSpot. A read- 2. BACKGROUND ability metric is a formula used to determine, ostensibly, the level of education needed to understand a text. Typically, There is a growing body of work in which NLP techniques these formulae operate on the number and length of the syl- are employed in game-studies research, centered in large part lables, words, and sentences of a text. Using three common around the efforts of JoséZagal, Noriko Tomuro, and their metrics|SMOG [30], the Coleman-Liau index [15], and the (former) colleagues at DePaul University. More precisely, Gunning fog index [6]|the authors find that the reviews are this work is characterized by its application of techniques written at a secondary-education reading level. From these from statistical natural language processing, a subfield of results, they argue against criticism that game reviews are NLP in which bottom-up statistical methods are applied written poorly and for a young demographic. to large collections of natural-language text.

What We Talk About When We Talk About Games: Bottom-Up Game Studies Using Natural Language Processing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support