Master Thesis Enhancements for Real-Time Monte-Carlo Tree Search In

Master Thesis Enhancements for Real-Time Monte-Carlo Tree Search in General Video Game Playing Dennis J. N. J. Soemers Master Thesis DKE 16-11 Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science of Artificial Intelligence at the Department of Data Science and Knowledge Engineering of the Maastricht University Thesis Committee: Dr. Mark H. M. Winands Chiara F. Sironi, M.Sc. Dr. ir. Kurt Driessens Maastricht University Faculty of Humanities and Sciences Department of Data Science and Knowledge Engineering Master Artificial Intelligence July 1, 2016 Preface This master thesis is written at the Department of Data Science and Knowledge Engineering, Maastricht University. The thesis describes my research into enhancing an agent using Monte-Carlo Tree Search for the real-time domain of General Video Game Playing. Parts of this research have been accepted for publication in the conference proceedings of the 2016 IEEE Conference on Computational Intelligence and Games. First, I would like to thank Dr. Mark Winands, not only for dedicating much time and effort supervising this thesis, but also my bachelor thesis and research internship in previous years. Additionally, I would like to thank Chiara Sironi, M.Sc., for also supervising this thesis. I would also like to thank my friend Taghi Aliyev for frequently helping me out. Finally, I would like to thank my family for supporting me. Dennis Soemers Maastricht, June 2016 III Abstract General Video Game Playing (GVGP) is a field of Artificial Intelligence where agents play a variety of real- time video games that are unknown in advance. This is challenging because it requires fast decision-making (40 milliseconds per decision), and it is difficult to use game-specific knowledge. Agents developed for GVGP are evaluated every year in the GVGP Competition. Monte-Carlo Tree Search (MCTS) is a search technique for game playing that does not rely on domain- specific knowledge. MCTS is known to perform well in related domains, such as General Game Playing. In 2015, MCTS-based agents were outperformed by other techniques in GVGP. The Iterated Width (IW) algorithm, which originates from classic planning, performed particularly well. In this thesis, it is investigated how MCTS can be enhanced to raise its performance to a competitive level in GVGP. This is done by evaluating enhancements known from previous research, extending some of them, and introducing new enhancements. These are partially inspired by IW. The investigated enhancements are Breadth-First Tree Initialization and Safety Prepruning, Loss Avoidance, Novelty-Based Pruning, Progressive History, N-Gram Selection Technique, Tree Reuse, Knowledge-Based Evaluations, Deterministic Game Detection, and Temporal-Difference Tree Search. The effect that these enhancements have on the performance of MCTS in GVGP is experimentally evaluated using sixty different GVGP games of the GVG-AI framework. Most of them are shown to provide statistically significant increases in the average win percentage individually. Among the agents combining multiple enhancements, the agent with the best win percentage (48:4%) performs significantly better than the baseline MCTS implementation of this thesis (31:0%), and close to the level of YBCriber, which was the winning agent of the GVGP competition at the IEEE CEEC 2015 conference (52:4%). V Contents 1 Introduction 1 1.1 Artificial Intelligence and Games . .1 1.2 General Video Game Playing (GVGP) . .1 1.3 Problem Statement and Research Questions . .2 1.4 Thesis Outline . .3 2 General Video Game Playing 4 2.1 Competition Rules . .4 2.2 The GVG-AI Framework . .4 2.3 Analysis of Games in GVG-AI . .6 2.3.1 Properties of Games and Framework . .6 2.3.2 Game Tree Model . .8 3 Search Techniques in GVGP 10 3.1 Background . 10 3.2 Monte-Carlo Tree Search (MCTS) . 11 3.2.1 Selection . 12 3.2.2 Play-out . 12 3.2.3 Expansion . 12 3.2.4 Backpropagation . 13 3.3 Iterated Width (IW) . 13 3.3.1 IW in Real-Time Games . 14 3.3.2 Implementation IW in GVGP . 15 4 Enhancements for MCTS in GVGP 17 4.1 Enhancements Inspired by IW . 17 4.1.1 Breadth-First Tree Initialization and Safety Prepruning (BFTI) . 17 4.1.2 Loss Avoidance (LA) . 18 4.1.3 Novelty-Based Pruning (NBP) . 21 4.2 Other Enhancements . 25 4.2.1 Progressive History (PH) . 25 4.2.2 N-Gram Selection Technique (NST) . 26 4.2.3 Tree Reuse (TR) . 27 4.2.4 Knowledge-Based Evaluations (KBE) . 28 4.2.5 Deterministic Game Detection (DGD) . 34 4.2.6 Temporal-Difference Tree Search (TDTS) . 35 VI 5 Experiments & Results 38 5.1 Setup . 38 5.2 Results . 39 5.2.1 Benchmark Agents . 39 5.2.2 Breadth-First Tree Initialization and Safety Prepruning . 41 5.2.3 Loss Avoidance . 44 5.2.4 Novelty-Based Pruning . 47 5.2.5 Progressive History and N-Gram Selection Technique . 49 5.2.6 Tree Reuse . 49 5.2.7 Knowledge-Based Evaluations . 51 5.2.8 Temporal-Difference Tree Search . 51 5.2.9 Enhancements Combined . 55 6 Conclusion 62 6.1 Research Questions . 62 6.2 Problem Statement . 63 6.3 Future Research . 63 References 65 VII Chapter 1 Introduction This chapter provides an introduction to this thesis. It discusses Artificial Intelligence for games, and the concept of General Video Game Playing, which is the focus of this thesis. The problem statement and four research questions are described next. Finally, this chapter contains an outline of the remainder of the thesis. 1.1 Artificial Intelligence and Games One of the main topics of research in Artificial Intelligence (AI) is game-playing. In many games, it is a challenging problem to find good decisions to make. The αβ search algorithm (Knuth and Moore, 1975) and a number of enhancements have eventually led to the Deep Blue system (Campbell, Hoane Jr, and Hsu, 2002) defeating the human world chess champion Garry Kasparov in 1997. In the game of Go, agents based on the αβ technique have not been able to compete with expert human players. The Monte-Carlo Tree Search (MCTS) (Kocsis and Szepesvári,2006; Coulom, 2007) algorithm has increased the performance of AI in this domain, and has been used by the AlphaGo (Silver et al., 2016) program, which beat the 9-dan professional human player Lee Sedol in 2016. MCTS also has applications in domains other than game-playing (Browne et al., 2012), which indicates that research in game-playing algorithms can also be useful for \real-world" problems. Even though the basic techniques used by game-playing agents such as Deep Blue and AlphaGo are applicable in a variety of domains, they also rely on domain-specific knowledge to be effective (for instance in the form of heuristics, or offline training using domain-specific data). This means that these agents are not able to play other games than the ones they were specifically programmed to play. To promote research in more generally applicable techniques, the General Game Playing (GGP) competition (Genesereth, Love, and Pell, 2005) is organized annually. In GGP, agents should be able to play any game of which the rules are specified in a Game Description Language (GDL). The focus in GGP is placed on abstract games. General Video Game Playing (GVGP) (Levine et al., 2013) is a similar concept, but it focuses on real-time video games instead of abstract games. The Arcade Learning Environment (Bellemare et al., 2013) is a similar framework that can be used to develop agents that play games of the Atari 2600 game console. This thesis focuses on GVGP. 1.2 General Video Game Playing (GVGP) To test the performance of different techniques for GVGP, the GVGP Competition is organized annually. The first GVGP Competition was held at the IEEE Conference on Computational Intelligence and Games (CIG) in 2014 (Perez et al., 2016). Three sets of ten different real-time video games per set, for a total of thirty games, were used in this competition. The first set (training set) could be downloaded by participants for testing during development. The second set (validation set) was kept private, but could be used for testing 1 through the competition's website (Perez, 2016). The final set (test set) was used to determine the rankings in the actual competition. This means that the games that were finally used to rank the participants in the competition were completely unknown to the participants and never used in any way before the competition. Many of the participants in 2014 used MCTS. In 2015, the second edition of the competition was held and associated with three conferences; ACM GECCO, IEEE CIG and IEEE CEEC (Perez-Liebana et al., 2016). MCTS was still used by many of the participating agents, but it was less dominant. Some of the highest-ranking agents chose between a variety of approaches based on observations made during gameplay. These approaches include Breadth First Search, evolutionary algorithms, and A*. Another approach that performed well was Iterated Width (Lipovetzky and Geffner, 2012), which was used by the NovTea (4th place at CIG) and YBCriber (1st place at CEEC) agents. In 2014 and 2015, the GVGP Competition consisted only of the Planning track. In this track, players have access to a forward model which can be used, for instance, for lookahead search. In 2016, it is planned to also run different tracks with different rules or goals; a Two-Player Planning Track, a Learning track and a Procedural Content Generation track. This thesis focuses on the Planning track. 1.3 Problem Statement and Research Questions MCTS performed well in the GVGP Competition of 2014, and is also known to frequently perform well in related domains, such as GGP (Björnssonand Finnsson, 2009).

Load more