Defoggan: Predicting Hidden Information in the Starcraft Fog of War with Generative Adversarial Nets
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets Yonghyun Jeong, Hyunjin Choi, Byoungjip Kim, Youngjune Gwon Samsung SDS Abstract Observed Ours Ground Truth We propose DefogGAN, a generative approach to the prob- lem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed state, DefogGAN generates defogged images of a game as 1 Replay predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to opti- mize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can predict the enemy buildings and combat units as accurately as 2 Replay professional players do and achieves a superior performance among state-of-the-art defoggers. Figure 1: Comparison of DefogGAN prediction to ground Introduction truth. Friendly and enemy units are represented as green and red in the map (black). The unobserved enemy units are pre- The success of AlphaGo (Silver et al. 2016) has brought dicted by DefogGAN. a significant attention for artificial intelligence in games (game AI). Agents trained by deep reinforcement learn- ing have demonstrated hands-down victories over expert notoriously difficult for StarCraft whose long withstanding human players in classic games such as Chess (Silver et popularity has compounded a broad range of adept game tac- al. 2018), Go (Silver et al. 2016), and Atari (Mnih et al. tics in addition to micro-control techniques (Ontanon´ et al. 2013). With more complex setting, real-time strategy (RTS) 2013) widespread in the E-sport scenes and Battle.net. games serve a means to evaluate state-of-the-art learning al- The fog of war refers to the lack of vision and information gorithms. Game AI today opens up new opportunities and on an area without a friendly unit around it, including all re- challenges for machine learning. The benefits of develop- gions that have been previously explored but left unattended ing game AI are widespread beyond gaming applications. currently. Partially Observable Markov Decision Process The exploration to adopt an intelligent agent in science (e.g., (POMDP) (Monahan 1982) best describes the fog of war predicting protein folding in organic chemistry (Evans et al. problem. In general, POMDP gives a practical formulation 2018)) and enterprise business service (e.g., chatbots (Satu, for most real-world problems characterized by having many Parvez, and Shamim-Al-Mamun 2015)) is making to enter a unobserved variables. For game AI, solving a partial obser- new era for game AI. vation problem is essential to improving its performance. In In this paper, we describe DefogGAN that takes a gen- fact, many existing approaches to design intelligent game AI erative approach to compensate imperfect information pre- often suffer from the partial observation problem (Xu et al. sented to a gamer due to the fog of war. We use StarCraft, an 2018). Recently, generative models are used to alleviate the RTS game featuring three well-balanced races for a gamer to uncertainty of partial observations. The agent’s performance choose and build substantially different playing styles and is enhanced from taking advantage of the (predictive) results strategies. StarCraft remains a popular E-sport after more obtained through a generative model (Synnaeve et al. 2018; than two decades of the original release. In a daunting aim Kahng et al. 2018). The generative approach, however, can- for our game AI to conquer highly-skilled human players, not fully match highly skillful scouting techniques of a top- we train our DefogGAN with more than 30,000 episodes of notch professional human player. expert and professional human replays. Such aim has been StarCraft provides a great platform to study complex Copyright c 2020, Association for the Advancement of Artificial POMDP problems related to game AI. We set up Defog- Intelligence (www.aaai.org). All rights reserved. GAN to accurately predict the state of an opponent hidden 4296 in the fog using the realistic information generated, thanks Mateas, and Jhala 2011; Xu et al. 2018). Scouting is the to generative adversarial nets (GANs) (Goodfellow et al. most straightforward defogging technique (Park et al. 2012; 2014). We find empirically that GANs generate more real- Si, Pisan, and Tan 2014). Interestingly, Justesen & Risi istic images than variational autoencoders (VAEs) (Kingma (2017) propose a deep learning-based approach to learn and Welling 2013). To generate a defogged game state, we the opponent status from units and upgrades information. have modified the original GAN generator into an encoder- Generative models give a new class of prediction tech- decoder network. niques in StarCraft AI. The convolutional encoder-decoder In principle, DefogGAN is a variant of conditional (CED) model (Synnaeve et al. 2018; Kahng et al. 2018) GAN (Mirza and Osindero 2014). Utilizing skip connec- can be used to recover information hidden in the fog. Syn- tions, the DefogGAN generator is trained on residual learn- naeve et al. (2018) find beneficial to use a convolutional en- ing from the encoder-decoder structure. In addition to the coder and a convolutional-LSTM encoder. Our approach of GAN adversarial loss, we set up a reconstruction loss be- using GAN to generate hidden information as a predictive tween fogged and defogged game states to emphasize the measure is new to the literature. regression of unit positions and quantities. This paper makes the following contributions. Generative Adversarial Nets (GAN) • We develop DefogGAN to resolve a fogged game state Goodfellow et al. (Goodfellow et al. 2014) introduce GAN to generate data from probabilistic sampling. GAN consti- into useful information for winning. DefogGAN makes G D one of the earliest GAN-based approaches to cope with tutes two neural nets, a generator and a discriminator , the StarCraft fog of war; trained in the competition described by a minimax game: • Using skip connections for residual learning, we have set E D x E − D G z min max x∼preal [log( ( ))] + z∼p(z)[log(1 ( ( )))] up DefogGAN to contain past information (sequence) in G D a feedforward manner without introducing any recurrent Radford, Metz, and Chintala (2015) have proposed DC- structure, making it suitable for real-time uses; GAN that uses a deep convolutional neural net as G. Vanilla • We empirically validate DefogGAN in ablation study and GAN is trained on the Jensen-Shannon divergence (JSD), other settings such as testing against extracted game inter- which can cause the vanishing gradient and mode collapse vals and the current state-of-the-art defog strategy. problems. WGAN (Arjovsky, Chintala, and Bottou 2017) Our dataset, source code, and pretrained networks are avail- proposes the use of the Wasserstein-1 metric to improve able online for public access.1 the vanilla GAN problems. Gulrajani et al. (2017) pro- pose WGAN-GP having a gradient penalty that has a sim- ilar effect as the weight clipping. Zhao et al. (2016) intro- Related Work duce Energy-based GAN (EBGAN) using an autoencoder. StarCraft AI Berthelot, Schumm, and Metz (2017) propose BEGAN that StarCraft is an immensely successful RTS game developed combines the WGAN and EBGAN ideas. We will experi- by Blizzard Entertainment. Since its original release in 1998, mentally compare the GAN variants for defogging perfor- StarCraft has attracted professional E-sport leagues and mil- mances. lions of amateur enthusiasts worldwide. Consisting of three Generative Approaches for Defogging fictional races, namely Terran, Protoss, and Zerg, StarCraft is considered as one of the most well-balanced online games The fog of war problem is similar to inpainting (Nazeri et ever created. The combinatorial complexity of player actions al. 2019) and denoising (Kingma and Welling 2013). How- is extremely high, although at a high level, winning condi- ever, there are three key differences. First, the enemy units tions for StarCraft can be built upon the military power and may be hidden even in the presence of the friendly units, so defogging must predict the location and the number of each an economy accumulated by the player. × StarCraft AI has a long history, reflecting a number of enemy unit type in a 2D grid space up to 4096 4096. Sec- different playing styles. Ontanon et al. (2013) point out ondly, defogging is a regression problem, which must infer that StarCraft playing essentially comprises two tasks. First, the number of units in the entire area based on a partial ob- micro-management refers to the ability to control units in- servation. Lastly, the problem is not just to generate an im- dividually. Good micro-management can keep a player’s age based on the masked (fogged) image. Defogging must worker and combat units alive for a long time. Secondly, indicate the grid where a unit of interest is likely to exist. macro-management is the ability to produce units and ex- pand the production facilities into regions other than the start DefogGAN location. This section presents DefogGAN, explaining its architecture Defogging can be crucial to both micro- and macro- and objective functions. We also describe our implementa- aspects of the game. Better estimation of hidden areas in tion details. the map will help win combats while