A Comparison Between the Usage of Flat and Structured Game Trees for Move Evaluation in Hearthstone
Total Page:16
File Type:pdf, Size:1020Kb
A Comparison Between the Usage of Flat and Structured Game Trees for Move Evaluation in Hearthstone Master’s Thesis Markus Zopf Knowledge Engineering Group Technische Universität Darmstadt Thesis Statement pursuant to § 22 paragraph 7 of APB TU Darmstadt I herewith formally declare that I have written the submitted thesis independently. I did not use any outside support except for the quoted literature and other sources mentioned in the paper. I clearly marked and separately listed all of the literature and all of the other sources which I employed when producing this academic work, either literally or in content. This thesis has not been handed in or published before in the same or similar form. In the submitted thesis the written copies and the electronic version are identical in content. Darmstadt, April 7, 2015 Markus Zopf Abstract Since the beginning of research in the field of Artificial Intelligence, games provide challenging problems for intelligent systems. Chess, a fully observable zero-sum game without randomness, was one of the first games investigated in detail. In 1997, nearly 50 years after the first paper about computers playing chess, the Deep Blue system won a six-game match against the chess Grandmaster Garry Kasparov. A lot of other games with an even harder setup were investigated since then. For example, the multiplayer card game poker with hidden information and the game Go with an enormous amount of possible game states are still a challenging task. Recently developed Monte Carlo algorithms try to handle the complexity of such games and achieve significantly better results than other approaches have before. Monte Carlo algorithms use random sampling to estimate the value of game states. They are particularly successful in games where it is hard to define a utility function to calculate the value of game states directly and where random simulations are easy to execute. In this thesis, we investigate two different Monte Carlo approaches in the recently released card game Hearthstone: Heroes of Warcraft. This game combines various difficulties in game playing, like hidden information, randomness, and big game trees. Upper Confidence Bound approaches use the concept of bandits to represent moves shallowly in a game whereas Upper Confidence Bound Applied to Trees algorithms build structured trees to find the best move. We find that both algorithms perform well against different random players. Win rates of about 0.90 can be achieved with low simulation counts used in both algorithms. Using higher simulation counts lead to even higher win rates of about 0.98. The direct comparisons of both algorithms show an unclear result: UCB surpasses UCT when only few simulations are used. This results from not widely enough expanded move trees in the UCT algorithm. If more simulations are available, UCT gets better and better and surpasses UCB. In experiments with the highest simulation counts, UCB beats UCT again. We additionally find that both approaches have different weaknesses when they are applied to the game Hearthstone. These weaknesses result from the enormously high branching factor in Hearthstone and the way moves can be decomposed into atomic actions. To investigate the playing strength of both approaches better, we suggest evaluating the performance of other approaches based on rules or heuristics learned with reinforcement learning and experiment with enhancements for the UCB and the UCT algorithms. Contents 1 Artificial Intelligence and Games 1 1.1 Thesis Structure ........................................................................................................................ 1 1.2 Games as Measurement of Intelligence ...................................................................................... 3 1.2.1 Intelligence ..................................................................................................................... 3 1.2.2 Games ............................................................................................................................. 4 1.3 Games and Optimal Play ........................................................................................................... 5 1.3.1 Simple Games ................................................................................................................. 5 1.3.2 Multiplayer Games .......................................................................................................... 6 1.3.3 Zero-Sum Games ............................................................................................................. 7 1.3.4 Solving Games ................................................................................................................ 8 1.3.5 Solving Two-Player Zero-Sum Games .............................................................................. 8 1.3.6 Search Space Size ........................................................................................................... 9 1.3.7 Randomness.................................................................................................................. 10 1.3.8 Hidden Information ...................................................................................................... 12 1.3.9 Pure and Mixed Strategies ............................................................................................. 15 1.4 Research Question .................................................................................................................. 15 1.5 Similarities to Real World Problems ........................................................................................ 17 2 Hearthstone: Heroes of Warcraft 18 2.1 Introduction ............................................................................................................................ 18 2.2 The Game ............................................................................................................................... 19 2.3 Cards ...................................................................................................................................... 20 2.3.1 Minions ......................................................................................................................... 20 2.3.2 Spells ............................................................................................................................ 21 2.3.3 Weapons ....................................................................................................................... 21 2.4 Abilities ................................................................................................................................... 22 2.5 Composing a Move of Atomic Actions...................................................................................... 24 3 Finding Solutions with Monte Carlo Methods 25 3.1 The Need for Simulations ........................................................................................................ 25 3.2 Game Trees in Hearthstone ..................................................................................................... 26 3.3 Move Tree Complexity ............................................................................................................ 28 3.4 Exploration/Exploitation Tradeoff ........................................................................................... 30 3.5 Bandit Approaches .................................................................................................................. 30 3.5.1 Upper Confidence Bound .............................................................................................. 32 3.5.2 Final Move Selection ..................................................................................................... 35 3.5.3 Anytime Property .......................................................................................................... 36 3.5.4 Aheuristic Property ....................................................................................................... 36 3.6 Monte Carlo Tree Search Approaches ...................................................................................... 37 3.6.1 Selection ....................................................................................................................... 38 3.6.2 Expansion ..................................................................................................................... 40 3.6.3 Simulation .................................................................................................................... 41 3.6.4 Backpropagation ........................................................................................................... 42 3.6.5 Upper Confidence Bound Applied to Trees .................................................................... 42 3.6.6 Asymmetric Property ..................................................................................................... 43 3.7 Post Monte Carlo Strategies .................................................................................................... 43 3.7.1 Do Nothing ................................................................................................................... 43 3.7.2 Additional Random Atomic Action ................................................................................ 44 3.7.3 Additional Random Move .............................................................................................. 44 3.7.4 Additional Random Move with Maximum Length .........................................................