UCT for Tactical Assault Planning in Real-Time Strategy Games
Total Page:16
File Type:pdf, Size:1020Kb
UCT for Tactical Assault Planning in Real-Time Strategy Games Radha-Krishna Balla and Alan Fern School of EECS, Oregon State University Corvallis, OR, 97331, USA {balla, afern}@eecs.oregonstate.edu Abstract effective tactical assault plans, in order to most effectively exploit limited resources to optimize a battle objective. We consider the problem of tactical assault plan- In this paper, we focus on automated planning for the ning in real-time strategy games where a team of RTS tactical assault problem. In particular, the goal is to friendly agents must launch an assault on an ene- develop an action selection mechanism that can control my. This problem offers many challenges including groups of military units to conduct effective offensive as- a highly dynamic and uncertain environment, mul- saults on a specified set of enemy forces. This type of as- tiple agents, durative actions, numeric attributes, sault is common after a player has built up forces and ga- and different optimization objectives. While the thered information about where enemy troops are located. dynamics of this problem are quite complex, it is Here the effectiveness of an assault is measured by an ob- often possible to provide or learn a coarse simula- jective function, perhaps specified by a user, which might tion-based model of a tactical domain, which ask the planner to minimize the time required to defeat the makes Monte-Carlo planning an attractive ap- enemy or to destroy the enemy while maximizing the re- proach. In this paper, we investigate the use of maining health of friendly units at the end of the battle. UCT, a recent Monte-Carlo planning algorithm for Such a mechanism would be useful as a component for this problem. UCT has recently shown impressive computer RTS opponents and as an interface option to hu- successes in the area of games, particularly Go, but man players, where a player need only specify the tactical has not yet been considered in the context of multi- assault objective rather than figure out how to best achieve agent tactical planning. We discuss the challenges it and then manually orchestrate the many low-level actions. of adapting UCT to our domain and an implemen- In addition to the practical utility of such a mechanism, tation which allows for the optimization of user RTS tactical assault problems are interesting from an AI specified objective functions. We present an evalu- planning perspective as they encompass a number of chal- ation of our approach on a range of tactical assault lenging issues. First, our tactical battle formulation involves problems with different objectives in the RTS temporal actions with numeric effects. Second, the problems game Wargus. The results indicate that our planner typically involve the concurrent control of multiple military is able to generate superior plans compared to sev- units. Third, performing well requires some amount of spa- eral baselines and a human player. tial-temporal reasoning. Fourth, due to the highly dynamic environment and inaccurate action models, partly due to the 1 Introduction unpredictable enemy response, an online planning mechan- Real-time strategy (RTS) games involve multiple teams ism is required that can quickly respond to changing goals acting in a real-time environment with the goal of gaining and unexpected situations. Finally, an effective planner military or territorial superiority over one another. To should be able to deal with a variety of objective functions achieve this goal, a player typically must address two key that measure the goodness of an assault. RTS sub-problems, resource production and tactical plan- The combination of the above challenges makes most ning. In resource production, the player must produce (or state-of-the-art planners inapplicable to RTS tactical assault gather) various raw materials, buildings, civilian and mili- problems. Furthermore, there has been little work on specia- tary units, to improve their economic and military power. In lized model-based planning mechanisms for this problem, tactical planning, a player uses military units to gain territo- with most games utilizing static script-based mechanisms. ry and defeat enemy units. A game usually involves an ini- One exception, which has shown considerable promise, is tial period where players rapidly build their economy via the use of Monte-Carlo planning for tactical problem resource production, followed by a period where those re- [Chung et al., 2005; Sailer et al., 2007]. While these ap- sources are exploited for offensive military assaults and proaches can be more flexible and successful than scripting, defense. Thus, one of the keys to overall success is to form they are still constrained by the fact that they rely on do- main-specific human knowledge, either in the form of a set 40 of human provided plans or a state evaluation function. It is where tactics are controlled at a group level. Thus, the ab- often difficult to provide this knowledge, particularly when stract state space used by our planner is in terms of proper- the set of run-time goals can change dynamically. ties of the sets of enemy and friendly groups, such as health In this work, we take a step toward planning more flexi- and location. The primary abstract actions we consider are ble behavior, where the designer need not specify a set of joining of groups and attacking an enemy group. The micro- plans or an evaluation function. Rather, we need only to management of individual agents in the groups under each provide the system with a set of simple abstract actions (e.g. abstract action is left to the default AI of the game engine. join unit groups, group attack, etc.) which can be composed together to arrive at an exponentially large set of potential 3 Related Work assault plans. In order to deal with this increased flexibility we draw on a recent Monte-Carlo planning technique, UCT Monte Carlo sampling techniques have been used suc- [Kocsis and Szepesvari, 2006], which has shown impressive cessfully to produce action strategies in board games like success in a variety of domains, most notably the game of bridge, poker, Scrabble and Go. The main difference be- Go [Gelly and Wang, 2006; Gelly and Silver, 2007]. UCT’s tween these games and our domain is that all these games ability to deal with the large state-space of Go and implicitly are turn-based with instantaneous effects, whereas actions in carry out the necessary spatial reasoning, makes it an inter- the RTS domain are simultaneous and durative. [Chung et esting possibility for RTS tactical planning. However, there al., 2005] used a form of Monte Carlo simulation for RTS are a number of fundamental differences between the RTS tactical planning with considerable success. At each plan- and Go domains, which makes its applicability unclear. The ning epoch the approach performed limited look-ahead to main contribution of this paper is to describe an abstract select an action by Monte Carlo simulation of random action problem formulation of tactical assault planning for which sequences followed by the application of an evaluation UCT is shown to be very effective compared to a number of function. Unfortunately this is highly reliant on the availa- baselines across a range of tactical assault scenarios. This is bility of a quality evaluation function, which makes the ap- a significant step toward arriving at a full model-based proach more challenging to bring to a new domain and less planning solution to the RTS tactical problem. adaptable to new goal conditions. Another Monte Carlo approach for RTS tactical problems [Sailer et al., 2007] assume a fixed set of strategies, and at 2 The RTS Tactical Assault Domain each step use simulation to estimate the values of various In general, the tactical part of RTS games involves both combinations of enemy and friendly strategies. These results planning about defensive and offensive troop movements are used to compute a Nash-policy in order to select a strat- and positioning. The ultimate goal is generally to complete- egy for execution. A weakness of this approach is its restric- ly destroy all enemy troops, which is typically achieved via tion to only consider strategies in the predefined set, which a series of well timed assaults while maintaining an ade- would need to be constructed on a per-domain basis. Com- quate defensive posture. In this paper, we focus exclusively parably, our approach does not require either a strategy set on solving RTS tactical assault problems, where the input is or an evaluation function, but rather only that a set of ab- a set of friendly and enemy units along with an optimization stract actions are provided along with the ability to simulate objective. The planner must then control the friendly troops their effects. However, unlike their approach, our planner in order to best optimize the objective. The troops may be assumes that the enemy is purely reactive to our assault, spread over multiple locations on the map and are often or- whereas their approach reasons about the offensive capacity ganized into groups. Typical assault objectives might be to of the enemy, though restricted to the provided set of strate- destroy the selected enemy troops as quickly as possible or gies. This is not a fundamental restriction for our planner as to destroy the enemy while losing as little health as possible. it can easily incorporate offensive actions of the enemy into Note that our focus on the assault problem ignores other the Monte Carlo process, likely at a computation cost. aspects of the full RTS tactical problem such as developing Recent work has also focused on model-based planning a strong defensive stance and selecting the best sequence of for the resource-production aspect of RTS games [Chan et assaults to launch.