Hierarchical Macro Strategy Model for MOBA Game AI
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Hierarchical Macro Strategy Model for MOBA Game AI Bin Wu Tencent AI Lab [email protected] Abstract There are four major aspects that make RTS games much The next challenge of game AI lies in Real Time Strategy more difficult compared to GO: 1) Computational com- (RTS) games. RTS games provide partially observable gam- plexity. The computational complexity in terms of action 20;000 ing environments, where agents interact with one another in space or state space of RTS games can be up to 10 , an action space much larger than that of GO. Mastering RTS while the complexity of GO is about 10250 (OpenAI 2018b). games requires both strong macro strategies and delicate mi- 2) Multi-agent. Playing RTS games usually involves mul- cro level execution. Recently, great progress has been made tiple agents. It is crucial for multiple agents to coordinate in micro level execution, while complete solutions for macro and cooporate. 3) Imperfect information. Different to GO, strategies are still lacking. In this paper, we propose a novel many RTS games make use of fog of war (Vinyals et al. learning-based Hierarchical Macro Strategy model for mas- 2017) to increase game uncertainty. When the game map tering MOBA games, a sub-genre of RTS games. Trained is not fully observable, it is essential to consider gaming by the Hierarchical Macro Strategy model, agents explicitly make macro strategy decisions and further guide their micro among one another. 4) Sparse and delayed rewards. Learn- level execution. Moreover, each of the agents makes indepen- ing upon game rewards in GO is challenging because the dent strategy decisions, while simultaneously communicat- rewards are usually sparse and delayed. RTS game length ing with the allies through leveraging a novel imitated cross- could often be larger than 20,000 frames, while each GO agent communication mechanism. We perform comprehen- game is usually no more than 361 steps. sive evaluations on a popular 5v5 Multiplayer Online Battle To master RTS games, players need to have strong skills Arena (MOBA) game. Our 5-AI team achieves a 48% win- in both macro strategy operation and micro level execu- ning rate against human player teams which are ranked top 1% in the player ranking system. tion. In recent study, much attention and attempts have been put to micro level execution (Vinyals et al. 2017; Tian et al. 2017; Synnaeve and Bessiere 2011; Wender and Introduction Watson 2012). So far, Dota2 AI developed by OpenAI us- Light has been shed on artificial general intelligence after ing reinforcement learning, i.e., OpenAI Five, has made the AlphaGo defeated world GO champion Lee Seedol (Silver most advanced progress (OpenAI 2018a). OpenAI Five was et al. 2016). Since then, game AI has drawn unprecedented trained directly on micro level action space using proxi- attention from not only researchers but also the public. Game mal policy optimization algorithms along with team rewards AI aims much more than robots playing games. Rather, (Schulman et al. 2017). OpenAI Five has shown strong games provide ideal environments that simulate the real teamfights skills and coordination comparable to top pro- world. AI researchers can conduct experiments in games, fessional Dota2 teams during a demonstration match held and transfer successful AI ability to the real world. in The International 2018 (DOTA2 2018). OpenAI’s ap- Although AlphaGo is a milestone to the goal of general proach did not explicitly model macro strategy and tried AI, the class of problems it represents is still simple com- to learn the entire game using micro level play. However, pared to the real world. Therefore, recently researchers have OpenAI Five was not able to defeat professional teams due put much attention to real time strategy (RTS) games such to weakness in macro strategy management (Vincent 2018; as Defense of the Ancients (Dota) (OpenAI 2018a) and Star- Simonite 2018). Craft (Vinyals et al. 2017; Tian et al. 2017), which represents Related work has also been done in explicit macro strat- a class of problems with next level complexity. Dota is a fa- egy operation, mostly focused on navigation. Navigation mous set of science fiction 5v5 Multiplayer Online Battle aims to provide reasonable destination spots and efficient Arena (MOBA) games. Each player controls one unit and routes for agents. Most related work in navigation used in- cooperate with four allies to defend allies’ turrets, attack en- fluence maps or potential fields (DeLoura 2001; Hagelback¨ emies’ turrets, collect resources by killing creeps, etc. The and Johansson 2008; do Nascimento Silva and Chaimow- goal is to destroy enemies’ base. icz 2015). Influence maps quantify units using handcrafted Copyright c 2019, Association for the Advancement of Artificial equations. Then, multiple influence maps are fused using Intelligence (www.aaai.org). All rights reserved. rules to provide a single-value output to navigate agents. 1206 Providing destination is the most important purpose of nav- of game phase modeling. Thereby, HMS reduces computa- igation in terms of macro strategy operation. The ability to tional complexity by incorporating game knowledge. More- get to the right spots at right time makes essential differ- over, each HMS agent conducts learning with a novel mech- ence between high level players and the others. Planning has anism of communication with teammates agents to cope also been used in macro strategy operation. Ontanon et al. with multi-agent challenge. Finally, we have conducted ex- proposed Adversarial Hierarchical-Task Network (AHTN) tensive experiments in a popular MOBA game to evaluate Planning (Ontanon´ and Buro 2015) to search hierarchical our AI ability. We matched with hundreds of human player tasks in RTS game playing. Although AHTN shows promis- teams that ranked above 99% of players in the ranked system ing results in a mini-RTS game, it suffers from efficiency and achieved 48% winning rate. issue which makes it difficult to apply to full MOBA games The rest of this paper is organized as follows: First, we directly. briefly introduce Multiplayer Online Battle Arena (MOBA) Despite of the rich and promising literature, previous games and compare the computational complexity with work in macro strategy failed to provide complete solution: GO. Second, we illustrate our proposed Hierarchical Macro First, reasoning macro strategy implicitly by learning Strategy model. Then, we present experimental results in the upon micro level action space may be too difficult. OpenAI fourth section. Finally, we conclude and discuss future work. Five’s ability gap between micro level execution and macro strategy operation was obvious. It might be over-optimistic Multiplayer Online Battle Arena (MOBA) to leave models to figure out high level strategies by sim- ply looking at micro level actions and rewards. We consider Games explicit macro strategy level modeling to be necessary. Game Description Second, previous work on explicit macro strategy heavily relied on handcrafted equations for influence maps/potential MOBA is currently the most popular sub-genre of the RTS fields computation and fusion. In practice, there are usu- games. MOBA games are responsible for more than 30% ally thousands of numerical parameters to manually decide, of the online gameplay all over the world, with titles such which makes it nearly impossible to achieve good perfor- as Dota, League of Legends, and Honour of Kings (Mur- mance. Planning methods on the other hand cannot meet ef- phy 2015). According to a worldwide digital games market ficiency requirement of full MOBA games. report in February 2018, MOBA games ranked first in gross- Third, one of the most challenging problems in RTS game ing in both PC and mobile games (SuperData 2018). macro strategy operation is coordination among multiple In MOBA, the standard game mode requires two 5-player agents. Nevertheless, to the best of our knowledge, previ- teams play against each other. Each player controls one unit, ous work did not consider it in an explicit way. OpenAI Five i.e., hero. There are numerous of heroes in MOBA, e.g., considers multi-agent coordination using team rewards on more than 80 in Honour of Kings. Each hero is uniquely micro level modeling. However, each agent of OpenAI Five designed with special characteristics and skills. Players con- makes decision without being aware of allies’ macro strat- trol movement and skill releasing of heroes via the game egy decisions, making it difficult to develop top coordination interface. ability in macro strategy level. As shown in Figure. 1a, Honour of Kings players use left Finally, we have found that modeling strategic phase is bottom steer button to control movements, while right bot- crucial for MOBA game AI performance. However, to the tom set of buttons to control skills. Surroundings are ob- best of our knowledge, previous work did not consider this. servable via the main screen. Players can also learn full map Teaching agents to learn macro strategy operation, how- situation via the left top corner mini-map, where observable ever, is challenging. Mathematically defining macro strat- turrets, creeps, and heroes are displayed as thumbnails. Units egy, e.g., besiege and split push, is difficult in the first place. are only observable either if they are allies’ units or if they Also, incorporating macro strategy on top of OpenAI Five’s are within a certain distance to allies’ units. reinforcement learning framework (OpenAI 2018a) requires There are three lanes of turrets for each team to defend, corresponding execution to gain rewards, while macro strat- three turrets in each lane. There are also four jungle areas on egy execution is a complex ability to learn by itself. There- the map, where creep resources can be collected to increase fore, we consider supervised learning to be a better scheme gold and experience.