Modeling Player Retention in Madden NFL 11
Total Page:16
File Type:pdf, Size:1020Kb
Modeling Player Retention in Madden NFL 11 Ben G. Weber Michael John Michael Mateas Arnav Jhala UC Santa Cruz Electronic Arts, Inc. UC Santa Cruz UC Santa Cruz Santa Cruz, CA Redwood City, CA Santa Cruz, CA Santa Cruz, CA [email protected] [email protected] [email protected] [email protected] Abstract Our domain of analysis is gameplay data collected from Xbox 360 players in Madden NFL 11, a popular football Video games are increasingly producing huge datasets game1. The task of the tool is to identify correlations be- available for analysis resulting from players engaging in interactive environments. These datasets enable in- tween gameplay features and the number of games played. vestigation of individual player behavior at a massive Our tool performs this task by building regression models scale, which can lead to reduced production costs and of player retention and by analyzing how modifying the in- improved player retention. We present an approach for puts to the model impacts the expected number of games modeling player retention in Madden NFL 11, a com- played. It utilizes several machine learning algorithms to mercial football game. Our approach encodes gameplay identify which features have the most significant effect on patterns of specific players as feature vectors and mod- retention. The output of the tool is evaluated by an analyst els player retention as a regression problem. By build- who provides the developer with recommendations for fu- ing an accurate model of player retention, we are able to ture game designs. identify which gameplay elements are most influential in maintaining active players. The outcome of our tool The tool we developed was part of a pilot study at Elec- is recommendations which will be used to influence the tronic Arts investigating the application of AI to analytics- design of future titles in the Madden NFL series. driven design. Our tool is deployed as a server-side system that performs off-line analysis of players, and the end users are analysts at Electronic Arts. The impact of our work is Introduction design recommendations that will be incorporated in future One of the major recent trends in video games is the use of iterations of Madden NFL. telemetry in order to record data and statistics about play- ers. Due to the widespread popularity of games, there are Game Telemetry now huge repositories of player data available for analysis. Telemetry has been shown to be useful in game development Game telemetry is the transmission of data from a game ex- as well as post release, enabling developers to track bugs, ecutable for recording and analysis (Zoeller 2010). Exter- detect cheaters, and balance an in-game economy (Zoeller nally, developers are using data collected from game teleme- 2010). By mining game telemetry, developers have been try to provide additional services to players. For example, able to reduce production costs, by identifying unused fea- the World of WarCraft Armory provides a web interface with tures, and improve player retention, by identifying features statistics about players and guilds. Internally, developers are correlated with active players. For these reasons, collecting using telemetry datasets for tasks including bug identifica- and analyzing data is quickly becoming a critical aspect in tion, cheat detection, asset selection, and game balancing. the process of producing large-scale video games. Hullett et al. (2011) show how telemetry data can be used to We explore the application of mining game telemetry to identify unused game assets and modes in Project Gotham aid Electronic Arts in the game design process. The purpose Racing 4, leading to reduced production costs for future iter- of our tool is to assist designers in answering questions about ations of the series. Zoeller (2010) presents the application the gameplay patterns of players. Specifically, the goal of of telemetry to the development process in order to improve our tool is to address the following questions: which game- game development workflow. play elements are most strongly correlated with player re- There are two ways in which telemetry is used to trans- tention, and what is the optimal win ratio for maximizing mit data to a server. In a streamed data approach, data is player retention? The intended impact of answering these sent to the server when an in-game trigger occurs. For ex- questions is to improve player retention, which will increase ample, Madden NFL 11 sends a message to the server when potential for year-to-year purchases and secondary revenue a player displays the on-screen help dialog. On the server sources, such as in-game purchases. side, streamed data is usually logged directly to text files, Copyright c 2011, Association for the Advancement of Artificial 1Madden NFL 11 was developed by EA Tiburon and published Intelligence (www.aaai.org). All rights reserved. by EA Sports. Trademarks belong to their respective owners. which serve as telemetry logs. In a session-based data ap- investigated the evolution of social groups in World of War- proach, summaries are sent to a server when a gameplay craft and identified archetypical player types. They applied session has completed. In Madden NFL 11, a summary of convex-hull matrix factorization to reduce high dimension- play-by-play results is sent to the server at the completion ality data to eight intuitively interpretable classes of players. of a game. Game telemetry can be generated by a variety Yannakakis (2008) identifies approaches for modeling of sources: the quality assurance group during development, player satisfaction in games, and discusses three types of beta testers prior to release, and players once the game has telemetry: play-game interaction data, physiological data, been released. and qualitative data. Physiological data includes data such One of the main challenges in using game telemetry is as heart rate and EMG, while qualitative data refers to less dealing with the scale of the data. The datasets collected quantifiable metrics such as challenge and flow. Models of from players can be massive, reaching terabytes in size. player satisfaction have been applied to optimizing player Game telemetry datasets are large because they contain logs engagement by adapting the difficultly level of the game of individual player behavior. To analyze data of this scale, (Yannakakis and Hallam 2009). Our tool differs from this distributed approaches such as Map-Reduce-Merge (Yang et work, because currently only play-game interaction data can al. 2007) have been used to distribute and analyze datasets be collected from a shipped game. on a large cluster. In our approach, we analyze a subset of Game telemetry has also been utilized in systems that telemetry using a representative sampling of players. learn by demonstration. Darmok is an online case-based planner that learns to play real-time strategy games by an- Related Work alyzing replays of expert demonstrations (Ontan˜on´ et al. 2010). The system builds a case library by extracting sets of Our tool is at the intersection of two research areas: actions from expert demonstrations, in the form of replays, computer-assisted game design, and game telemetry. There into cases with annotated goals. During runtime, Darmok is an extensive literature on game design, but the use of employs a delayed case adaptation mechanism to reuse the tools for supporting the design process has only recently case library in new game situations. received attention (Nelson and Mateas 2009). Nelson and Mateas claim that tools for supporting game design need Feature Selection to reason about game mechanics, which are fundamentally We represent the goal of identifying the most influential different from current domains for which there are design gameplay features as a feature selection problem. Regres- support tools. Smith et al. (2010) address this problem by sion can be used to perform feature selection by building representing game mechanics as a formal logic. Our work a statistical model of the data, evaluating the characteris- differs from this research direction, because we are focus- tics of the model, and identifying the most pertinent features ing on mining information from a large volume of human- (Suard, Goutier, and Mercier 2010). generated playtesting as opposed to automated analysis of Consider a dataset containing telemetry recorded from n games using machine playtesting. players. Each player’s actions are encoded as a feature vec- One of the main uses of game telemetry is player mod- tor with dimensionality j of the form: eling, which includes recognizing player goals, identifying distinct types of players, and modeling player engagement. x =< x1; x2; x3; : : : ; xj > Research on goal recognition in games varies widely, be- where the features are normalized and each vector has a cor- cause different genres of games have unique player objec- responding label, y. Our algorithm starts by building a re- tives. In the domain of real-time strategy games, a player gression model, f, which maps a set of input features to a may have the goal of executing a specific strategy. Weber predicted label: and Mateas (2009) present an approach to goal recognition y = f(x) in this domain by modeling strategy recognition as a classifi- unique effect cation problem. They grouped strategies into distinct classes Next, the regression model is used to find the , g k and trained several classifiers on thousands of replays from , of each feature, : professional players. In a football game, the player has a g(k; v) = f(x) goal of executing a specific play. Laviers et al. (2009) ex- plore an approach to play recognition using support vector given the restriction: machines to identify plays based on football player positions v : j = k xj = and trajectories. Their system was applied to the task of call- xj : otherwise ing play switches in response to identifying specific defen- sive plays.