Counter-Strike Deathmatch with Large-Scale Behavioural Cloning
Total Page:16
File Type:pdf, Size:1020Kb
Counter-Strike Deathmatch with Large-Scale Behavioural Cloning Tim Pearce1;2 ∗, Jun Zhu1 1Tsinghua University, 2University of Cambridge Abstract This paper describes an AI agent that plays the popular first-person-shooter (FPS) video game ‘Counter-Strike; Global Offensive’ (CSGO) from pixel input. The agent, a deep neural network, matches the performance of the medium difficulty built-in AI on the deathmatch game mode, whilst adopting a humanlike play style. Unlike much prior work in games, no API is available for CSGO, so algorithms must train and run in real-time. This limits the quantity of on-policy data that can be generated, precluding many reinforcement learning algorithms. Our solution uses behavioural cloning — training on a large noisy dataset scraped from human play on online servers (4 million frames, comparable in size to ImageNet), and a smaller dataset of high-quality expert demonstrations. This scale is an order of magnitude larger than prior work on imitation learning in FPS games. Gameplay examples: https://youtu.be/p01vWk7uMvM arXiv:2104.04258v1 [cs.AI] 9 Apr 2021 Figure 1: Screenshot, agent’s vision and map overview for the deathmatch game mode. 1 Introduction Deep neural networks have matched human performance in a variety of video games; from 1970’s Atari classics, to 1990’s first-person-shooter (FPS) titles Doom and Quake III, and modern real-time- strategy games Dota 2 and Starcraft II [Mnih et al., 2015, Lample and Chaplot, 2017, Jaderberg et al., 2019, Berner et al., 2019, Vinyals et al., 2019]. Something these games have in common is existence of an API allowing researchers both to interface with the game easily, and to simulate it at speeds far quicker than real time and/or run it cheaply at ∗Project started while affiliated with University of Cambridge, now based in Tsinghua University. Preprint. scale. This is necessary for today’s deep reinforcement learning (RL) algorithms, which require large amounts of experience in order to learn effectively – a simple game such as Atari’s Breakout required 38 days of playing experience for vanilla Q-learning to master [Mnih et al., 2015], whilst for one of the most complex games, Dota 2, an actor-critic algorithm accumulated around 45,000 years of experience [Berner et al., 2019]. Games without APIs, that can’t be run easily at scale, have received less research attention. This is unfortunate since in many ways these games bring challenges closer to those in the real world – without access to large-scale simulations, one is forced to explore more efficient algorithms. In this paper we take on such a challenge; building an agent for Counter-Strike: Global Offensive (CSGO), with no access to an API, and only modest compute resources (several GPUs and one game terminal). Released in 2012, CSGO continues to be one of the world’s most popular games in player numbers (around 20 million unique players per month2) and audience viewing figures34. The complexity of CSGO is an order of magnitude higher than the FPS games previously studied – the system requirements for the CSGO engine (Source) are 100× that of Doom and Quake III engines [Kempka et al., 2016]. CSGO’s constraints preclude mass-scale on-policy rollouts, and demand an algorithm efficient in both data and compute, which leads us to consider behavioural cloning. Whilst prior work has explored this for various games, demonstration data is typically limited to what authors provide themselves through a station set up to capture screenshots and log key presses. Playing repetitive games at low resolution means these datasets remain small – from one to five hours (section 5) – producing agents of limited performance. Our work takes advantage of CSGO’s popularity to record data from other people’s play – by joining games as a spectator and scraping screenshots and inferring actions. This allows us to collect a dataset an order of magnitude larger than in previous FPS works, at 4 million frames or 70 hours, comparable in size to ImageNet. We train a deep neural network on this large dataset, then fine-tune it on smaller clean high-quality datasets. This leads to an agent capable of playing an aim training mode just below the level of a strong human (top 10% of CSGO players), and able to play the deathmatch game mode to the level of the medium-difficulty built-in AI (the rules-based bots available as part of CSGO). We are careful to point out that our work is not exclusively focused on creating the highest performing agent – perfect aim can be achieved through some simple geometry and accessing backend information about enemy locations (built-in AI uses this) [Wydmuch et al., 2019]. Rather, we intend to produce something that plays in a humanlike fashion, that is challenging to play against without placing human opponents at an unfair disadvantage. Contributions: The paper is useful from several perspectives. • As a case study in building an RL agent in a setting where noisy demonstration data is plentiful, but clean expert data and on-policy rollouts are expensive. • As an approach to building AI for modern games without APIs and only modest compute resources. • As evidence of the scalability of large-scale behavioural cloning for FPS games. • As a milestone toward building an AI capable of playing CSGO. Organisation: The paper is organised as follows. Section 2 introduces the CSGO environment and behavioural cloning. The key design choices of the agent are given in section 3, and the datasets and training process is described in section 4. Related work is listed in section 5. The performance is evaluated and discussed in section 6. Section 7 concludes and considers future work. Appendix A contains details on the challenge of interfacing with game, appendix B provides game settings used. 2 Background This section introduces the CSGO environment, and briefly outlines technical details of behavioural cloning. 2https://www.statista.com/statistics/808922/csgo-users-number/ 3https://www.statista.com/statistics/1125460/leading-esports-games-hours-watched/ 4https://twitchtracker.com/games 2 2.1 CSGO Environment CSGO is played from a first person perspective, with mechanics and controls that are standard across FPS games – the keyboard is used to move the player left/right/forward/backwards, while mouse movement turns the player horizontally and vertically, serving both to look around and aim weapons. In CSGO’s full ‘competitive mode’, two teams of five players win either by eliminating the opposing team, or completing an assigned objective. Success requires mastery of behaviour at three time horizons; In the short term an agent must control its aim and movement, reacting to enemies. Over medium term horizons the agent must navigate through map regions, manage its ammunition and react to its health level. In the long term an agent should manage its economy, plan strategies, adapt to opponents’ strengths and weaknesses and cooperate with teammates. As one of the first attempts to play CSGO from pixel input, we do not consider the full competitive mode. Instead we focus on two simpler modes, summarised in table 1 (full settings in appendix B). Screenshots for each mode can be found in figures 1 (deathmatch) & 2 (aim training). ‘Aim training mode’5 provides a controlled environment for human players to improve their reactions, aim and recoil control. The player stands fixed in the centre of a visually uncluttered map, while unarmed enemies run toward them. It is not possible to die, and ammunition is unlimited. This constitutes the simplest environment we consider. ‘Deathmatch mode’ rewards players for killing any enemy on the opposing team (two teams, ‘terrorists’ and ‘counter-terrorists’). After dying a player regenerates at a random location. Whilst it does not require the long-term strategies of competitive mode, most other elements are intact. It is played on the same maps, with the full variety of weapons available, ammunition must be managed, and the agent should distinguish between teammates and enemies. We further consider three settings of deathmatch mode, all on the ‘de_dust2’ map, with the agent on the terrorist team, with the ‘AK-47’ equipped. 1) Easy – with built-in AI bots, on easy mode, with pistols, reloading not required, 12 vs 12. 2) Medium – with built-in AI bots, on medium difficulty, all weapons, reloading required, 12 vs 12. 3) Human – with human players, all weapons, reloading required, 10 vs 10. Table 1: CSGO game modes and the behaviour horizon required for success in each. Short-term Medium-term Long-term Game mode Reactive & control Navigation & ammo Strategy & cooperation Aim training 3 7 7 Deathmatch 3 3 7 Competitive 3 3 3 2.2 Behavioural Cloning In behavioural cloning (also ‘imitation learning’) an agent aims to mimic the action a demonstrator (typically an ‘expert’) would take given some observed state. Learning is based on a dataset of the expert’s logged behaviour, consisting of observations, o 2 O, paired with the actions taken, a 2 A. For N such pairs the dataset is, D = ffo1; a1g ::: foN ; aN gg. An agent is represented by a policy that takes as input an observation and outputs a probability distribution over actions. For a policy parameterised by θ the action is typically selected either via, a^ ∼ πθ(ajo), or, a^ = argmaxaπθ(ajo). Behavioural cloning reduces learning a sequential decision making process to a supervised learning problem. Given some loss function, l : A × A ! R, that measures the distance between predicted and demonstrated actions, a model is asked to optimise, N X θ = argminθ l (ai; a^i) ; i 5Fast aim / reflex training workshop map: https://steamcommunity.com/sharedfiles/ filedetails/?id=368026786 3 where, l, might typically be cross-entropy or mean squared error.