Training Agents to Playtest Modern Games

Winning Isn’t Everything: Training Agents to Playtest Modern Games Igor Borovikov∗y, Yunqi Zhao∗†, Ahmad Beirami∗† Jesse Hardery, John Koleny, James Pestraky, Jervis Pintoy, Reza Pourabolghasemy Harold Chaputy, Mohsen Sardariy, Long Liny, Navid Aghdaiey, Kazi Zamany Abstract (MCTS) (Coulom 2006; Kocsis and Szepesvári 2006) was a big leap in solving games. MCTS agents for play- Recently, there have been several high-profile achieve- ing Settlers of Catan were reported in (Szita, Chaslot, ments of agents learning to play games against humans and beat them. We propose an approach that instead and Spronck 2009; Chaslot et al. 2008) and shown to addresses how the player experiences the game, which beat previous heuristics. Other work compares multi- we consider to be a more challenging problem. In this ple approaches of agents to one another in the game paper, we present an alternative context for developing Carcassonne on the two-player variant of the game and AI that plays games. Specifically, we study the prob- discusses variations of MCTS and Minimax search for lem of creating intelligent game agents in service of playing the game (Heyden 2009). MCTS has also been the development processes of the game developers that applied to the game of 7 Wonders (Robilliard, Fonlupt, design, build, and operate modern games. We high- and Teytaud 2014) and Ticket to Ride (Huchler 2015). light some of the ways in which we think intelligent Recently, DeepMind researchers demonstrated that agents can assist game developers to understand their games, and even to build them. Our main contribution deep neural networks (DNNs) combined with MCTS is to propose a learning and planning framework that is could lead to AI agents that play Go at a super-human uniquely tuned to the environment and needs of mod- level (Silver et al. 2016), and solely via self-play (Silver ern game engines, developers and players. Our game et al. 2017; Silver et al. 2018). Subsequently, OpenAI agent framework takes a few steps towards addressing researchers showed that AI agents could learn to coop- the unique challenges that game developers face. We erate at a human level in Dota 2 (OpenAI Five 2018). discuss some early results from an initial implementa- The impressive recent progress on RL to solve games tion of our framework. is partly due to the advancements in processing power and AI computing technology.1 Further, deep Q net- Artificial intelligence; artificial game agent; rein- works (DQNs) have emerged as a general representation forcement learning; imitation learning; deep learning; learning framework from the pixels in a frame buffer model-based learning; game design; game playtesting; combined with Q function approximation without need non-player character (NPC). for task-specific feature engineering (Mnih et al. 2015). The design of a DQN and setting the hyperparame- Introduction ters is still a daunting task. In addition, it takes hun- The history of artificial intelligence (AI) can be mapped dreds of thousands of state-action pairs for the agent by its achievements playing and winning various games. to reach human-level performance. Applying the same From the early days of Chess-playing machines to the techniques to modern games would require obtaining most recent accomplishments of Deep Blue and Al- and processing even more state-action pairs, which is phaGo, AI has advanced from competent, to competi- infeasible in most cases because speeding up the game tive, to champion in even the most complex games. engine may not be possible and the game state may be Games have been instrumental in advancing AI, and difficult to infer from the frame buffer. The costs asso- most notably in recent times, reinforcement learning ciated with such an approach may be too high for many (RL). IBM Deep Blue was the first AI agent who applications, not justifying the benefits. beat the chess world champion, Gary Kasparov (Deep On modern strategy games, DeepMind and Blizzard Blue 1997). A decade later, Monte Carlo Tree Search showed that existing techniques fall short even on learning the rules of StarCraft II (Vinyals et al. 2017). While ∗These authors contributed equally to this work. Contact: {iborovikov, yuzhao, abeirami}@ea.com. breaking the state space and action space into a hi- yEA Digital Platform – Data & AI, Electronic Arts, 209 erarchy of simpler learning problems has shown to be Redwood Shores Pkwy, Redwood City, CA 94065, USA. Copyright c 2019, Association for the Advancement of Ar- 1The amount of AI compute has been doubling every 3-4 tificial Intelligence (www.aaai.org). All rights reserved. months in the past few years (AI & Compute 2018). promising (Vinyals et al. 2017; Pang et al. 2018), ad- (Smith, Nelson, and Mateas 2010) addresses the be- ditional complications arise when building agents for a havior emanating from the design by having an engine modern game that is still under development. Some of capable of recording play traces. (Treanor et al. 2015) these challenges are: proposes an ideation technique to embed design pat- 1. The game state space is huge, with continuous at- terns in AI based game design. (Zhu, Wang, and Zyda tributes, and is only partially observable to the agent. 2018) uses a measure for similarity between game events to transfer different levels across games. See (Togelius 2. The set of available actions is huge, parametric, et al. 2011; Summerville et al. 2018) for a survey of partly continuous, and potentially unknown to the these techniques in game design. agent rendering a direct application of MCTS infea- In the next section, we make the case for using AI as sible. more of a tool to help designers tune their game, rather 3. The game itself is dynamic in the design and de- than build an agent with super-human performance or velopment stage, and multiple parameters and at- in order to create a game (or a part thereof). tributes (particularly related to graphics) may change between different builds. Playtesting Game Agents 4. The games are designed to last tens of millions of While achieving optimal super-human gameplay using timesteps2 leading to potentially long episodes, and modern RL techniques is impressive, our goal is to train the manner the player engages with the game envi- agents that can help game designers ensure their game ronment impacts the gameplay strategy.3 provides players with optimal experiences. As it is not obvious to define a reward function that abstracts an 5. Multiple players may interact in conflict or cooper- optimal experience, this problem does not necessarily ation leading to an exploding state space and non- lend itself to a traditional RL formulation. For instance, convergence issues due to invalidation of the Markov we considered the early development of The Sims Mo- assumption in a multi-agent learning environment. bile, whose gameplay is about “emulating life”: players 6. Winning isn’t everything, and the goal of the agent create avatars, called Sims, and conduct them through could be to exhibit human-like behavior/style to bet- a variety of everyday activities. In this game, there is no ter engage human players, making it non-trivial to single predetermined goal to achieve. Instead, players design a proper rewarding mechanism. craft their own experiences, and the designer’s objective is to evaluate different aspects of that experience. The idea of using AI techniques to augment game To validate their design, game designers conduct development and playtesting is not new. Algorith- playtesting sessions. Playtesting consists of having a mic approaches have been proposed to address the is- group of players interact with the game in the develop- sue of game balance, in board games (De Mesen- ment cycle to not only gauge the engagement of play- tier Silva et al. 2017; Hom and Marks 2007) and card ers, but also to discover elements and states that result games (Krucher 2015; Mahlmann, Togelius, and Yan- in undesirable outcomes. As a game goes through the nakakis 2012). More recently, (Holmgard et al. 2018) various stages of development, it is essential to con- builds a variant of MCTS to create a player model for tinuously iterate and improve the relevant aspects of AI Agent based playtesting. These techniques are rel- the gameplay and its balance. Relying exclusively on evant to creating rewarding mechanisms for mimicking playtesting conducted by humans can be costly and in- player behavior. efficient. Artificial agents could perform much faster Another line of research focuses on investigating ap- play sessions, allowing the exploration of much more of proaches where AI and machine learning can play the the game space in much shorter time. This becomes role of a co-designer, making suggestions during devel- even more valuable as game worlds grow large enough opment (Yannakakis, Liapis, and Alexopoulos 2014). to hold tens of thousands of simultaneously interacting Tools for creating game maps (Liapis, Yannakakis, and players. Games at this scale render traditional human Togelius 2013) and level design (Smith, Whitehead, and playtesting infeasible. Mateas 2010; Shaker, Shaker, and Togelius 2013) are Recent advances in the field of RL, when applied also proposed. Other approaches have explored AI for to playing computer games (e.g., (OpenAI Five 2018; designing new games. (Browne and Maire 2010) gen- Mnih et al. 2015; Vinyals et al. 2017; Harmer et al. erates entirely new abstract games by means of evolu- 2018)), assume that the purpose of a trained artificial tionary algorithms. (Salge and Mahlmann 2010) relates agent (“agent” for short) is to achieve the best possi- the game design to the concept of relevant information. ble performance with respect to clearly defined rewards while the game itself remains fixed for the foreseen fu- 2A timestep is the unit time where the agent takes an action, its observation is updated, and a reward signal is ture.

Load more