<<

The Thirty-Fourth AAAI Conference on Artificial (AAAI-20)

DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets

Yonghyun Jeong, Hyunjin Choi, Byoungjip Kim, Youngjune Gwon Samsung SDS

Abstract Observed Ours Ground Truth We propose DefogGAN, a generative approach to the prob- lem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed

state, DefogGAN generates defogged images of a game as 1 Replay predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to opti- mize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can

predict the enemy buildings and combat units as accurately as 2 Replay professional players do and achieves a superior performance among state-of-the-art defoggers. Figure 1: Comparison of DefogGAN prediction to ground Introduction truth. Friendly and enemy units are represented as green and red in the map (black). The unobserved enemy units are pre- The success of AlphaGo (Silver et al. 2016) has brought dicted by DefogGAN. a significant attention for artificial intelligence in games (game AI). Agents trained by deep reinforcement learn- ing have demonstrated hands-down victories over expert notoriously difficult for StarCraft whose long withstanding human players in classic games such as Chess (Silver et popularity has compounded a broad range of adept game tac- al. 2018), Go (Silver et al. 2016), and Atari (Mnih et al. tics in addition to micro-control techniques (Ontanon´ et al. 2013). With more complex setting, real-time strategy (RTS) 2013) widespread in the E-sport scenes and Battle.net. games serve a means to evaluate state-of-the-art learning al- The fog of war refers to the lack of vision and information gorithms. Game AI today opens up new opportunities and on an area without a friendly unit around it, including all re- challenges for . The benefits of develop- gions that have been previously explored but left unattended ing game AI are widespread beyond gaming applications. currently. Partially Observable Markov Decision Process The exploration to adopt an intelligent agent in science (e.g., (POMDP) (Monahan 1982) best describes the fog of war predicting protein folding in organic chemistry (Evans et al. problem. In general, POMDP gives a practical formulation 2018)) and enterprise business service (e.g., chatbots (Satu, for most real-world problems characterized by having many Parvez, and Shamim-Al-Mamun 2015)) is making to enter a unobserved variables. For game AI, solving a partial obser- new era for game AI. vation problem is essential to improving its performance. In In this paper, we describe DefogGAN that takes a gen- fact, many existing approaches to design intelligent game AI erative approach to compensate imperfect information pre- often suffer from the partial observation problem (Xu et al. sented to a gamer due to the fog of war. We use StarCraft, an 2018). Recently, generative models are used to alleviate the RTS game featuring three well-balanced races for a gamer to uncertainty of partial observations. The agent’s performance choose and build substantially different playing styles and is enhanced from taking advantage of the (predictive) results strategies. StarCraft remains a popular E-sport after more obtained through a generative model (Synnaeve et al. 2018; than two decades of the original release. In a daunting aim Kahng et al. 2018). The generative approach, however, can- for our game AI to conquer highly-skilled human players, not fully match highly skillful scouting techniques of a top- we train our DefogGAN with more than 30,000 episodes of notch professional human player. expert and professional human replays. Such aim has been StarCraft provides a great platform to study complex Copyright c 2020, Association for the Advancement of Artificial POMDP problems related to game AI. We set up Defog- Intelligence (www.aaai.org). All rights reserved. GAN to accurately predict the state of an opponent hidden

4296 in the fog using the realistic information generated, thanks Mateas, and Jhala 2011; Xu et al. 2018). Scouting is the to generative adversarial nets (GANs) (Goodfellow et al. most straightforward defogging technique (Park et al. 2012; 2014). We find empirically that GANs generate more real- Si, Pisan, and Tan 2014). Interestingly, Justesen & Risi istic images than variational autoencoders (VAEs) (Kingma (2017) propose a deep learning-based approach to learn and Welling 2013). To generate a defogged game state, we the opponent status from units and upgrades information. have modified the original GAN generator into an encoder- Generative models give a new class of prediction tech- decoder network. niques in StarCraft AI. The convolutional encoder-decoder In principle, DefogGAN is a variant of conditional (CED) model (Synnaeve et al. 2018; Kahng et al. 2018) GAN (Mirza and Osindero 2014). Utilizing skip connec- can be used to recover information hidden in the fog. Syn- tions, the DefogGAN generator is trained on residual learn- naeve et al. (2018) find beneficial to use a convolutional en- ing from the encoder-decoder structure. In addition to the coder and a convolutional-LSTM encoder. Our approach of GAN adversarial loss, we set up a reconstruction loss be- using GAN to generate hidden information as a predictive tween fogged and defogged game states to emphasize the measure is new to the literature. regression of unit positions and quantities. This paper makes the following contributions. Generative Adversarial Nets (GAN) • We develop DefogGAN to resolve a fogged game state Goodfellow et al. (Goodfellow et al. 2014) introduce GAN to generate data from probabilistic sampling. GAN consti- into useful information for winning. DefogGAN makes G D one of the earliest GAN-based approaches to cope with tutes two neural nets, a generator and a discriminator , the StarCraft fog of war; trained in the competition described by a minimax game: • Using skip connections for residual learning, we have set E D x E − D G z min max x∼preal [log( ( ))] + z∼p(z)[log(1 ( ( )))] up DefogGAN to contain past information (sequence) in G D a feedforward manner without introducing any recurrent Radford, Metz, and Chintala (2015) have proposed DC- structure, making it suitable for real-time uses; GAN that uses a deep convolutional neural net as G. Vanilla • We empirically validate DefogGAN in ablation study and GAN is trained on the Jensen-Shannon divergence (JSD), other settings such as testing against extracted game inter- which can cause the vanishing gradient and mode collapse vals and the current state-of-the-art defog strategy. problems. WGAN (Arjovsky, Chintala, and Bottou 2017) Our dataset, source code, and pretrained networks are avail- proposes the use of the Wasserstein-1 metric to improve able online for public access.1 the vanilla GAN problems. Gulrajani et al. (2017) pro- pose WGAN-GP having a gradient penalty that has a sim- ilar effect as the weight clipping. Zhao et al. (2016) intro- Related Work duce Energy-based GAN (EBGAN) using an autoencoder. StarCraft AI Berthelot, Schumm, and Metz (2017) propose BEGAN that StarCraft is an immensely successful RTS game developed combines the WGAN and EBGAN ideas. We will experi- by Blizzard Entertainment. Since its original release in 1998, mentally compare the GAN variants for defogging perfor- StarCraft has attracted professional E-sport leagues and mil- mances. lions of amateur enthusiasts worldwide. Consisting of three Generative Approaches for Defogging fictional races, namely Terran, Protoss, and Zerg, StarCraft is considered as one of the most well-balanced online games The fog of war problem is similar to inpainting (Nazeri et ever created. The combinatorial of player actions al. 2019) and denoising (Kingma and Welling 2013). How- is extremely high, although at a high level, winning condi- ever, there are three key differences. First, the enemy units tions for StarCraft can be built upon the military power and may be hidden even in the presence of the friendly units, so defogging must predict the location and the number of each an economy accumulated by the player. × StarCraft AI has a long history, reflecting a number of enemy unit type in a 2D grid space up to 4096 4096. Sec- different playing styles. Ontanon et al. (2013) point out ondly, defogging is a regression problem, which must infer that StarCraft playing essentially comprises two tasks. First, the number of units in the entire area based on a partial ob- micro-management refers to the ability to control units in- servation. Lastly, the problem is not just to generate an im- dividually. Good micro-management can keep a player’s age based on the masked (fogged) image. Defogging must worker and combat units alive for a long time. Secondly, indicate the grid where a unit of interest is likely to exist. macro-management is the ability to produce units and ex- pand the production facilities into regions other than the start DefogGAN location. This section presents DefogGAN, explaining its architecture Defogging can be crucial to both micro- and macro- and objective functions. We also describe our implementa- aspects of the game. Better estimation of hidden areas in tion details. the map will help win combats while the player has a higher chance of making the right decision for the future. A poor Overview observation in general can hurt macro-management (Weber, DefogGAN generates a fully observed (defogged) state from a partially observed (fogged) state at a time t. For Star- 1https://github.com/TeamSAIDA/DefogGAN Craft, a fully observed state includes the exact locations

4297 ttsadtedsrmntravraills r sdt train to used are generator. loss the adversarial observed discriminator fully the actual the and and states predicted reconstruc- the The between generator. loss the tion entering concatenated before and current accumulated the are to 2 observations maps Figure past Feature the time. sum-pooled. on are given input a current state the observed at on partially computed units maps enemy Feature DefogGAN. and presents friendly all of redyuis hc r led ati h urn partial size current of for array the an in in part results a This observation. already exclude are and which units, buildings friendly enemy include however, servations, tt.Aprilyosre tt tatime a at state observed partially A state. bu oiguis twudb nufiin olanhwto how learn to insufficient be would it units, moving about l.Wt 6ui ye,a1v- trrf aesaeis state game StarCraft 1-vs-1 a types, is unit the 4096 StarCraft and 66 in channel, With image a game els. up raw makes a of type It size unit height, game. Each width, the of channels. in array and three-dimensional units a all as of represented locations is exact the of consisting nievnlaGNta eeae nimage an generates that GAN vanilla Unlike Observations Partial Accumulating DefogGAN. to applied is denote We Notation o nm obtuis atal bevdsae accumu- states time observed channels until Partially 16 lated units. and units combat friendly enemy for for channels observed 34 size partially include en- of nels a array the units, an Ignoring static is observed. state are fully of which half input buildings, the the emy making in visible, channels always the are units friendly Craft, onoe eeao nta fadcnouinlnet. deconvolutional a of instead generator toencoder aetvral,Dfgnest eeaeadfge obser- defogged a generate to vation needs Defog variable, latent Exposed Visible faprilyosre state observed partially a If Dragoon x ˜ t × O Combining . ଴ O ଵ O y O ˆ ଶ ଷ O 4096 t ୲ ie ata observation partial a given f Accumulated( y Tank (¯ t x × rudtuhflyosre tt ttime at state observed fully ground-truth a t )=ˆ t 66 sdntdby denoted is O euse We . ଴ ~ x ¯ y O t ୲ t ) and = 4096

G 4096 4096 x ˜ (¯ y ˆ x t t Dragoon Dragoon octntdttlinput total concatenated a , t × x ¯ o rdce ul observed fully predicted a for )= t x ˜ 4096

ڰ ak eprlinformation temporal lacks

ڰ t 4096 4096 Zealot Tank cuuae ata ob- partial Accumulated . Dec x × ¯ ( iue2 rhtcua vriwo DefogGAN. of overview Architectural 2: Figure t Enc eo a nau- an has Defog . 50 4096 4096 ee 0chan- 50 Here, . t (¯

ڰ 32 ڰ 50 Accumulated Partial Observed State( x is t )) x × (H) (H) × 32 32 x ¯ fake Partial Observed State ( Partial Observed t 4096 4096 nStar- In . Dragoon Dragoon 3 4 rma from 3 4 1 4 ڰ ڰ (W)32 32(W) Zealot Tank × pix- 32 x 3 4 t 3 t , 1 ( 4298 ( ܥ ܥ 50 ڰ ڰ 32 ௫ ௫ ෦ ೟ ೟ ݔ ) ) ഥ ௧ ) ݔ ෥ ௧ tuto osbtenagnrtdstate generated a between loss struction ina nipticesspeiinadrcl.DefogGAN recall. and precision concatenated in increases observa- takes input partial an accumulated as using that tion show we paper, the utpelvl fpoighvn ifrn ie ( loss. sizes reconstruction different pyramidal illustrates having 3 pooling between Figure MSE of the levels of sum multiple a as loss reconstruction pyramidal et(nhe l 03.B noprtn cuuae par- cur- accumulated the incorporating By onto observation 2013). tial observations al. partial et has (Mnih past approach rent stacking DefogGAN duration, for Our 10-minute needed). a opted for are states frames game long 14,400 many infer too relatively to take should (e.g., a general frames in has nets recurrent StarCraft and time, comes playing dilution Since however, information vanishing. net, as gradient such neural disadvantages recurrent some a with Using net. 2018). neural al. et Kahng 2018; al. et (Synnaeve state information a semantic of preserving while generate state to observed how fully learn a efficiently to DefogGAN allows pling etantegenerator the train We Loss Reconstruction Pyramidal eeaeaflyosre state observed fully a generate farwsaei o ag,w euei o3 2 More 32. x 32 to state it observed partially reduce a we 2, x large, Figure in too shown is as specifically, state raw a of servation truth ¯ ) t L noe Decoder Encoder o sn eprlifrain ecuduearecurrent a use could we information, temporal using For oeta euedownsampled use we that Note L snwa ra fsz ( size of array an now is D G  = y = = Generator Skip connection t ofrhrehnetegnrtr eintroduce we generator, the enhance further To . − − E L x ˜ E E x adv ¯ t t x y ¯ Reconstruction loss aiiae uhtmoa nomto.Ltrin Later information. temporal such facilitates ∼ ∼ t ∼ X Y X x ¯ par [log( par t edrv h desra objectives adversarial the derive we , , x ˜ , t (H) x (H) 32 ˜ 32 D ∼ x t Fully ObservedState( Predicted Fully Observed State ( Observed Predicted Fully ¯ G ∼ x X t ( t X Dragoon Dragoon 2 y 3 (¯ 2 2 acc and x t =¯ acc 1 4 1 4 ))] 32(W) 32(W) ڰ ڰ t [log(1 Tank Tank 32 ⊕ x [log(1 x ˜ t t x ˜ ⊕ : × y t 3 3 t ) cuuae ata ob- partial Accumulated . x ˜ 32 − 1 1 ymnmzn h recon- the minimizing by ( ݕ ( x ܥ 66 66 ܥ t

ڰ ڰ ௧ − ௬ ෞ ௬ ) ೟ . ೟ t ) ) D × and D ( Discriminator 66 G ( ݕ ෝ ௧ G y ˆ ) (¯ y .Tedownsam- The ). t x (¯ t x n h ground the and t ic h size the Since . t ⊕ ⊕ Adversarial loss x ˜ x ˜ t )))] t H )))] Real/Fake , W (1) (2) ). Sum pooling Observation Preserving Connection (2×2 filter, stride 2) ࣦ ݔ , ݕ ,32 1 ௧ ௧ The DefogGAN encoder and decoder are connected in a 66 , ,16 -ࣦ ݔ௧ ݕ௧ symmetrical structure. We add residuals between the en 1 ࣦ ݔ , ݕ ,8 2 66 2 ௧ ௧ coder and decoder at each layer to maintain the parts , ,4 ࣦ ݔ௧ ݕ௧ that have been seen already. By doing so, the generator 4 66 4 , ,2 learns the parts that are hidden in the fog (He et al. 2016; ࣦ ݔ௧ ݕ௧ 8 66 8 Isola et al. 2017). Through the encoder network, the com- 16 , ,1 -ࣦ ݔ௧ ݕ௧ pressed feature is well-communicated to the decoder for effi 66 16 cient learning. In particular, the observation preserving con- 32 32 66 nections that tie the beginning and the end convey the infor- 32 ( ) mation that has been observed already. This allows Defog- Defog (ࡳ ࢚࢞ )Real (࢚࢟) GAN to focus on the information of the units that needs to be inferred. That is, the generator F (xt) learns by observ- Figure 3: Pyramidal reconstruction loss. The pyramid on the ing informational connection with G(xt) less the observed left shows the generation of yˆt and the reduction of resolu- informational connection xt: tion through sum pooing. The pyramid on the right repre- F (xt)=G(xt) − xt. sents sum pooing of the ground truth yt. The mean squared error (MSE) can be measured between the same resolutions. Total Objective Note that pyramidal L(xt,yt,s) measures the MSE with a stride s. The total objective of DefogGAN is

LD = − Ey∼Y [log(D(yt))] − E − D G x ⊕ x , Multiple predictions at different scale are generated by x¯t∼Xpar ,x˜t∼Xacc [log(1 ( (¯t ˜t)))] sum pooling. By adjusting filter and stride sizes, sum pool- LG = λadvLadv + λrecLrec. ing can generate multiple predictions in a pyramidal shape. More specifically, for a given feature map mt, the sum pool- Note hyperparameters λadv and λrec. In this paper, we use m ,s m s λ =0.999 and λ =0.001. ing function sumpool( t ) creates t with a stride and rec adv a filter size s. This can be formulated as follows Training s·(j+1)−1 s·(i+1)−1  The overall training procedure of DefogGAN is presented m, s m i, j m w, h , sumpool( )= t( )= t( ) in Algorithm 1. We use Adam (Kingma and Ba 2014) for h=s·j w=s·i training both discriminator and generator. where (w, h) is the coordinates of mt, and (i, j) is the coor- m Algorithm 1 Training the DefogGAN model dinates of t. θG,θD ← initialize network parameters At each generation, Ldist is evaluated as follows. λrec =0.999,λadv =0.001,r =32,epoch=0   repeat X ← L (x ,y ,s)=E sumpool(y ,s) − sumpool(G(x ),s)2 batch from dataset dist t t t t 2 Yˆ ← G(X) Y  ← Y,Yˆ For the resolution r (using r =32) to be reduced, pyra-  p ← D Y midal reconstruction loss is evaluated by data ( ) pg ← D(Yˆ ) log r 2 Lrec ← Eq.(3) i Lrec = wi ·Ldist(xt,yt, 2 ). (3) LD ← Eq.(2) i=0 Ladv ← Eq.(1) LG ← λrecLrec + λadvLadv A scaling factor wi adjusts the loss values at different //Update parameters according to gradients θ ←−∇ L scales. It is defined G θG G θ ←−∇ L −i D θD D 4 epoch ← epoch wi =  . +1 log2 r −k epoch k=0 4 until = 1000 The proposed pyramidal reconstruction loss allows De- fogGAN to learn the total number of units with the lowest resolution of 1 × 1. Finally, by incorporating the reconstruc- Implementation tion loss, the generator loss of DefogGAN is extended as Generator The DefogGAN generator follows the style of the VGG network (Simonyan and Zisserman 2014). The fil- LG = λadvLadv + λrecLrec. ter size is fixed at 3 × 3. The number of filters doubles when

4299 the feature map size is reduced by half. DefogGAN does not x x use any spatial pooling or fully-connected layers but uses Table 1: Confusion matrix of ¯t and ˜t. Using test data convolutional layers to preserve spatial information from in- (more than 10,000 frames), average numbers are shown. y (GT) put to output. t Total The DefogGAN generator consists of encoder, decoder, Exist Not exist 32×32× Exist 81.58 0 81.58 and a channel combination layer. The encoder uses x¯t (partial) 82 input and extracts semantic features hidden in the fog Not exist 59.35 67443.07 67502.42 by convolutional neural networks (CNNs). Each convolu- Total 140.93 67443.07 67584.00 tional layer uses batch normalization and rectified linear unit y (GT) t Total (ReLU) to make the nonlinear conversion possible (Ioffe and Exist Not exist Szegedy 2015; Nair and Hinton 2010). Exist 109.49 7.30 116.79 x˜ (accum.) The decoder generates predictive data using semantically t Not exist 31.45 67435.76 67467.21 extracted encoder features. The decoding process recon- Total 140.94 67443.06 67584.00 structs data into a high dimension, and the inference is done using the transposed convolution operation. The decoder produces the same output shape as the input shape. We do Evaluation Metrics not use as many convolutional layers as ResNet, considering the speed of learning due to the large initial channel size (He For performance evaluation, we compute five metrics: et al. 2016). Mean Squared Error (MSE) The MSE between yˆt and The final channel combination layer consists of a single yt is   convolutional layer, which combines the 82 channels of ac- = E y − yˆ 2 C C MSE t t 2 . cumulated partial observations x˜t and x¯t to obtain 66 C y Our MSE criterion measures: 1) correct prediction of unit channels yˆt of information to predict. This infers ˆt. types present at each location 2) correct prediction of how Discriminator The DefogGAN discriminator is similar to many (if present). that of DCGAN (Radford, Metz, and Chintala 2015). Three Accuracy, Precision, Recall and F1 score Accuracy in- convolutional layers are used with a leaky ReLU activation dicates how well the existence of units is predicted. Recall function. Dropout is used for all layers instead of batch nor- reflects how much false negative rate (type 2 error) is im- malization. Through the fully connected final layer, predict- proved. For DefogGAN perspective, type 2 error gives a ing real or fake can be learned. more practical indicator because the damage caused by an unexpected enemy (false negative) is greater than a nonex- Experiments istent enemy (false positive). Precision represents a type 1 Dataset error as a percentage of what is expected to exist. The F1 score indicates the harmonic mean of recall and precision. We have collected a large dataset of more than 33,000 re- plays of professional StarCraft players. Our experiments uti- Determining Generator Training Interval lize replay log files, which contain detailed unit information. We have experimentally determined a reasonable amount of For each frame, a concatenated partial observation (xt)of H × W × C ⊕ C data needed for training the DefogGAN generator. Table 2 the fogged map ( x¯t x˜t ) exists along the y H × W × C summarizes the generator performance measured in MSE, corresponding ground truth ( t)in yt . From accuracy, and F1 score computed by varying number of each episode, we have decided to only use a portion from frames used in training. Due to the nature of the DefogGAN the 7th to the 17th minutes. This is because high-level units prediction, the MSE criterion is most valuable. The empiri- in a StarCraft game start to appear in about 7 minutes. Also, cal results suggest training with 10-sec worth of frames the the game typically finishes in 10 to 20 minutes (Lin et al. best among our tested intervals. 2017) although not many replays are of more than 17 min- utes of duration. Our dataset comprises 496,830 frames. We use 80% of the data for training, 10% validation, and 10% Table 2: The DefogGAN generator performance comparison testing. on varying number of frames used in training. Table 1 summarizes the DefogGAN input-output statis- Interval(frame) MSE Acc. F1 Recall Preci. tics, including partially observed states x¯t, accumulated par- 5s(9,165) 0.00211 0.99942 0.854 0.808 0.906 tially observed states x˜t, and ground truth yt. On average, 10s(13,692) 0.00208 0.99944 0.856 0.807 0.913 54% of the total number of units are seen in partial observa- 30s(31,442) 0.00215 0.99945 0.860 0.808 0.918 tion, and 83% are seen in accumulated partial observation. 60s(56,963) 0.00213 0.99946 0.862 0.814 0.915 Note that accumulated partial observation causes a type 1 600s(496,830) 0.00230 0.99944 0.859 0.820 0.901 error (i.e., false positive) because accumulated states con- tain the previous locations of moving units that are obso- lete at the current time. Given this output space, the defog Baseline problem is to select an average of 141 spaces out of 67,584 As presented in Table 3, accumulated partial observation (32 × 32 × 66) spaces possible. x˜t results in better performance than partial observation x¯t.

4300 Observed (ݔ෤) Observed (ݔҧ) CED DCGAN BEGAN WGAN-GPcWGAN DefogGAN (our) Ground Truth £

Replay Replay self 3+ ¤ 2 1 Replay Replay both

¥ 1 2

Replay Replay 3+ enemy ¦ Replay Replay

Figure 4: Visualization of prediction results. On far left, we show the accumulated partially observed states (x˜t). The second column depicts a partially observed state, x¯t. The prediction result by CED, a state-of-the-art defogger is shown in the third column. Columns 4–7 are the results by DCGAN, BEGAN, WGAN-GP and cWGAN. Our DefogGAN result is presented in the eighth column, and the ground truth on the last. Rows represent replays used for evaluation.

fogGAN is able to predict enemy units hidden in fog. On Table 3: The overall accuracy performance of partially ob- the other hand, DefogGAN seems to provide similar predic- served states x¯t and accumulated partially observed states x tion performance in terms of accuracy and F1 score. Note ˜t. The MSE indicates how well the units are positioned and that accuracy and F1 score do not measure how accurately numbered in the unit map, and the Accuracy, F1 scores, Re- the number of units are predicted, but just measure how ac- call and Precision indicates how well the units are aligned in curately the existence is predicted. Then, the result can be the unit map. understood that DefogGAN can predict the number of units MSE Acc. F1 Recall Preci. much precisely while correctly predicting the overall distri- x ¯t (partial) 0.00548 0.99912 0.733 0.579 1 bution of units on a map. x˜t (accum.) 0.00370 0.99943 0.850 0.777 0.937 Comparison with autoencoder model Compared to CED, one of autoencoder-based models, DefogGAN pro- In fact, many rule-based agents leverage x˜ (Ontanon´ et al. t vide about 33% decreased MSE, and about 17% point in- 2013; Synnaeve et al. 2018). We take accumulated partial creased F1 score. Note that recall of DefogGAN is very high, observation as our baseline. compared to that of CED. This high recall means that Defog- Accuracy Comparison GAN successfully discover enemy units hidden in fog. This high recall property is very important in StarCraft, since In this section, we present a comparative performance anal- misdetected enemy units (i.e., low recall) can increase pos- ysis for DefogGAN. A rule-based StarCraft agent using ac- sible threat such as sudden attacks. cumulated partial observation is a reasonable baseline. This baseline means that a prediction model needs to make at Comparison with GAN-based models DefogGAN least better prediction than just memorizing partial obser- makes a better prediction compared to other GAN-based vation history. For comprehensive comparison, we select models. As shown in Table 4, unconditional base GAN a diverse range of models including an autoencoder-based models such as DCGAN and BEGAN performs very model CED (Synnaeve et al. 2018; Kahng et al. 2018), sim- poorly. This is mainly because these models are trained ple GAN-based models, DCGAN (Radford, Metz, and Chin- without reconstruction loss. WGAN-GP makes relatively tala 2015) and BEGAN (2017), and WGAN-based mod- good prediction results without reconstruction loss, but els, WGAN-GP (2017) and cWGAN (Ebenezer, Das, and does not exceed DefogGAN. We carefully think that the Mukhopadhyay 2019). Wasserstein distance of WGAN-GP makes an effect of Comparison with baseline As shown in Table 4, Defog- reconstruction loss in training. Therefore, we do additional GAN results in a 44% decrease in MSE compared to the comparison with cWGAN, a WGAN variants that has baseline. DefogGAN predicts the number of units in a given reconstruction loss. However, cWGAN does not provide cell more accurately than the baseline. This is because De- better performance than WGAN-GP.

4301 better than using only the accumulated partial observed in- Table 4: Accuracy comparison results. DefogGAN is com- formation and 51% better than using only the partial ob- pared with various other models. served information. This indicates that it is important to uti- MSE Acc. F1 Recall Preci. lize past information. In addition, when used in combination Baseline 0.00370 0.99943 0.850 0.777 0.937 with partially observed and accumulated partially observed CED 0.00311 0.99896 0.682 0.538 0.933 information, the total number of units observed from the past DCGAN 2.16007 0.94844 0.019 0.239 0.010 BEGAN 0.01578 0.99353 0.024 0.039 0.018 is identified, and certain information without type 1 errors is WGAN-GP 0.00348 0.99885 0.701 0.648 0.763 used for learning. In other words, it contributes to the per- cWGAN 0.00372 0.99878 0.688 0.644 0.737 formance improvement by showing the number of units as DefogGAN 0.00208 0.99944 0.856 0.807 0.913 much as possible and the units that can be confirmed as cor- rect. Effect of adversarial learning The third row of Table Visualization of prediction results The prediction perfor- 5 shows the overall accuracy performance of DefogGAN mance of DefogGAN can be effectively explained with the when trained without adversarial loss. Without adversar- visualization in Figure 4. We randomly select four replays ial loss, the overall accuracy performance significantly de- and present the defogged fully observed states predicted by creases. MSE increases about 49% (i.e., from 0.00208 to each model. For example, in replay 4, we cannot see red en- 0.00310). F1 score decreases by 0.194 (i.e., from 0.856 to emy units in the lower right corner of the partially observed 0.662). In the area of image generation, learning with adver- x state ¯t. Also, we can only see a subset of enemy units from sarial loss generates clearer images than learning with MSE x the accumulated partially observed states ˜t. By using both loss (Pathak et al. 2016; Isola et al. 2017). In DefogGAN, observation and accumulated observation, DefogGAN gen- we see a similar effect. We conjecture that adversarial loss y erates a fully observed state ˆt that looks most similar to also helps accurately predict the fully observed states of a the ground truth. Since DCGAN and BEGAN do not use re- game. construction loss, they fail to generate a fully observed state that has similar pattern to the ground truth. CED generates Effect of reconstruction loss Pyramidal reconstruction fairly plausible full states, but DefogGAN generates more loss helps to learn fully observed states. Since it measures accurate results. WGAN-GP generates plausible full states the difference between a predicted fully observed state and without reconstruction loss. However, it seems to have a ten- the ground truth at multiple scales, it helps DefogGAN ac- dency to generate false positive results (i.e., low precision). curately predict the total number of units hidden in the fog. cWGAN (a WGAN-GP variant that additionally use recon- Effect of observation preserving connection As shown struction loss) seems to reduce such false positives, but still in the 6th row of Table 5, when trained without observation do not make a prediction better than DefogGAN. preserving connection, the overall accuracy performance of Ablation Study DefogGAN significantly decreases. More specifically, MSE increases about 200% (i.e., from 0.00208 to 0.00410). F1 We evaluate the performance of the proposed method that score decreases by 0.340 (i.e., from 0.856 to 0.516). This x combines accumulated partial observation ˜t and partial ob- can be considered as a similar effect that skip connection x servation ¯t, joint loss and reduced resolution loss. Finally, of U-Net (Ronneberger, Fischer, and Brox 2015) provides we compare the performance of the observation preserving better results by allowing information to flow from input to connection. output.

Table 5: Ablation study results. Each component is excluded Conclusion from DefogGAN in order: current partial observation, ac- We have presented DefogGAN, a generative approach for cumulated past partial observation, adversarial loss, recon- game AI to predict crucial state information unavailable struction loss, pyramidal loss (L2 loss was used instead), and due to the fog of war. DefogGAN accurately generates de- observation preserving connection. fogged images of a game that can be used to improve win MSE. Acc F1 Recall Preci. rates against expert human players. In our experiments with Xpar 0.00293 0.99930 0.831 0.826 0.836 StarCraft, we have validated that DefogGAN achieves a su- Xacc 0.00426 0.99897 0.732 0.674 0.802 perior performance against state-of-the-art defoggers. Im- L adv 0.00310 0.99887 0.662 0.529 0.882 proving on imperfect information during an RTS game play Lrec 1.73986 0.97942 0.026 0.133 0.015  could bring substantially better macro-management overall, Pyramidal 0.00210 0.99943 0.855 0.809 0.907 although this is an ongoing research area for game AI. De-  Ob-conn 0.00401 0.99829 0.516 0.437 0.631 DefogGAN 0.00208 0.99944 0.856 0.807 0.913 fogGAN is one of the earliest applications for adversarial learning to improve the fog of war problem, and it can be applied to other real-world POMDP problems. DefogGAN proposed in Table 5 shows that our proposed techniques in the ablation study produce good performance. Acknowledgment Effect of concatenated partial observation Using the The authors would like to thank Dr. Wonpyo Hong, CEO concatenated partial observation method, the MSE is 29% and President of Samsung SDS, whose vision in AI has led

4302 to create SAIDA Lab for advanced AI research. Nair, V., and Hinton, G. E. 2010. Rectified linear units improve restricted boltzmann machines. In Proc. of ICML. References Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.; and Ebrahimi, Arjovsky, M.; Chintala, S.; and Bottou, L. 2017. Wasserstein M. 2019. Edgeconnect: Generative image inpainting with gan. arXiv preprint arXiv:1701.07875. adversarial edge learning. arXiv preprint arXiv:1901.00212. Berthelot, D.; Schumm, T.; and Metz, L. 2017. Be- Ontanon,´ S.; Synnaeve, G.; Uriarte, A.; Richoux, F.; gan: Boundary equilibrium generative adversarial networks. Churchill, D.; and Preuss, M. 2013. A survey of real-time arXiv preprint arXiv:1703.10717. strategy game ai research and competition in starcraft. IEEE Ebenezer, J. P.; Das, B.; and Mukhopadhyay, S. 2019. Single Trans. on Computational Intelligence and AI in games. image haze removal using conditional wasserstein genera- Park, H.; Cho, H.-C.; Lee, K.; and Kim, K.-J. 2012. Predic- tive adversarial networks. arXiv preprint arXiv:1903.00395. tion of early stage opponents strategy for starcraft ai using Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; scouting and machine learning. Proc. of the Workshop at Qin, C.; Zidek, A.; Nelson, A.; Bridgland, A.; Penedones, SIGGRAPH Asia. H.; et al. 2018. De novo structure prediction with deeplearn- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; and ing based scoring. Annu Rev Biochem. Efros, A. A. 2016. Context encoders: Feature learning by Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; inpainting. In Proc. of CVPR, 2536–2544. Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. Radford, A.; Metz, L.; and Chintala, S. 2015. Unsupervised 2014. Generative adversarial nets. NeurIPS. representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; and Courville, A. C. 2017. Improved training of wasserstein Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: gans. NeurIPS. Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual and computer-assisted intervention, 234–241. Springer. learning for image recognition. In Proc. of CVPR, 770–778. Satu, M. S.; Parvez, M. H.; and Shamim-Al-Mamun. 2015. Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accel- Review of integrated applications with aiml based chatbot. erating deep network training by reducing internal covariate In 2015 International Conference on Computer and Infor- shift. arXiv preprint arXiv:1502.03167. mation Engineering (ICCIE). Isola, P.; Zhu, J.-Y.; Zhou, T.; and Efros, A. A. 2017. Image- Si, C.; Pisan, Y.; and Tan, C. T. 2014. A scouting strategy for to-image translation with conditional adversarial networks. real-time strategy games. In Proc. of the 2014 Conference In Proc. of CVPR, 1125–1134. on Interactive Entertainment, 1–8. ACM. Justesen, N., and Risi, S. 2017. Learning macromanage- Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; ment in starcraft from replays using deep learning. In 2017 Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; IEEE Conference on Computational Intelligence and Games Panneershelvam, V.; Lanctot, M.; et al. 2016. Mastering the (CIG), 162–169. IEEE. Game of Go with Deep Neural Networks and Tree Search. Kahng, H.; Jeong, Y.; Cho, Y. S.; Ahn, G.; Park, Y. J.; Jo, nature. U.; Lee, H.; Do, H.; Lee, J.; Choi, H.; et al. 2018. Clear Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, the fog: Combat value assessment in incomplete information M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, games with convolutional encoder-decoders. arXiv preprint T.; et al. 2018. A general algorithm arXiv:1811.12627. that masters chess, shogi, and go through self-play. Science. Kingma, D. P., and Ba, J. 2014. Adam: A method for Simonyan, K., and Zisserman, A. 2014. Very deep convo- stochastic optimization. arXiv preprint arXiv:1412.6980. lutional networks for large-scale image recognition. arXiv Kingma, D. P., and Welling, M. 2013. Auto-encoding vari- preprint arXiv:1409.1556. ational bayes. arXiv preprint arXiv:1312.6114. Synnaeve, G.; Lin, Z.; Gehring, J.; Gant, D.; Mella, V.; Lin, Z.; Gehring, J.; Khalidov, V.; and Synnaeve, G. 2017. Khalidov, V.; Carion, N.; and Usunier, N. 2018. Forward Stardata: A starcraft ai research dataset. In Thirteenth Artifi- modeling for partial observation strategy games - a starcraft cial Intelligence and Interactive Digital Entertainment Con- defogger. NeurIPS. ference. Weber, B. G.; Mateas, M.; and Jhala, A. 2011. A particle Mirza, M., and Osindero, S. 2014. Conditional generative model for state estimation in real-time strategy games. In adversarial nets. arXiv preprint arXiv:1411.1784. Seventh Artificial Intelligence and Interactive Digital Enter- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; tainment Conference. Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Play- Xu, S.; Kuang, H.; Zhuang, Z.; Hu, R.; Liu, Y.; and Sun, ing atari with deep reinforcement learning. NeurIPS Deep H. 2018. Macro with deep reinforcement Learning Workshop. learning in starcraft. arXiv preprint arXiv:1812.00336. Monahan, G. E. 1982. State of the art—a survey of partially Zhao, J.; Mathieu, M.; and LeCun, Y. 2016. Energy- observable markov decision processes: theory, models, and based generative adversarial network. arXiv preprint algorithms. Management Science 28(1):1–16. arXiv:1609.03126.

4303