Evaluation Functions in General Game Playing

Evaluation Functions in General Game Playing Dissertation zur Erlangung des akademischen Grades Doktor rerum naturalium (Dr. rer. nat.) vorgelegt an der Technischen UniversitätDresden FakultätInformatik eingereicht von Daniel Michulke geboren am 13. April 1982 in Stollberg [email protected] Gutachter: Prof. Michael Thielscher Prof. Stefan Edelkamp Datum der Einreichung: 27. April 2012 Datum der Verteidigung: 22. Juni 2012 Abstract While in traditional computer game playing agents were designed solely for the purpose of playing one single game, General Game Playing is concerned with agents capable of playing classes of games. Given the game’s rules and a few minutes time, the agent is supposed to play any game of the class and eventually win it. Since the game is unknown beforehand, previously optimized data structures or human- provided features are not applicable. Instead, the agent must derive a strategy on its own. One approach to obtain such a strategy is to analyze the game rules and create a state evaluation function that can be subsequently used to direct the agent to promising states in the match. In this thesis we will discuss existing methods and present a general approach on how to construct such an evaluation function. Each topic is discussed in a modular fashion and evaluated along the lines of quality and efficiency, resulting in a strong agent. iii iv Acknowledgements It is my pleasure to thank all the people who helped make this thesis possible. First and foremost, I would like to thank my advisor Michael Thielscher for his guid- ance through all the years. He introduced me to the topic that today affects my life in areas I did not expect. His advice and feedback were at all times invaluable and his words always marked by precision and patience. I am also indebted to Stephan Schiffel. Besides our collaboration on a few articles, his support in reviewing papers and this thesis cannot be matched. Our numerous discussions were thought provoking and often a fruitful source for new ideas, as well as a sink for those that would have led to a dead end. I also want to thank all my other colleagues at the Dresden University for our some- times helpful but always joyous discussions. Finally, my gratitude goes to my friends and family who were incredibly supportive during all these years. Especially, I want to thank my partner C´ıntia for her love and trust and her family who helped me to get through the last months of this thesis. This thesis was supported by the German National Academic Foundation and I here- with would like to express my gratitude for financing and supporting this work. v Contents 1. Introduction 1 1.1. Evaluation Functions . 1 1.2. Contributions . 2 1.3. Outline . 4 2. Game Playing 5 2.1. General Game Playing . 6 2.2. Basic Notions of Games . 15 2.3. Move Selection and State Evaluation . 16 2.4. Search Algorithms . 21 2.5. Criteria of Evaluation Functions . 25 2.6. A Word on Experimental Evaluation . 27 3. Evaluation Functions I - Aggregation 30 3.1. Choice of Evaluation Function . 30 3.2. An Aggregation Function Based on Neural Networks . 34 3.3. High-Resolution State Evaluation using Neural Networks . 46 3.4. Summary . 59 4. Evaluation Functions II - Features 60 4.1. Categorization of Features . 60 4.2. A New View on Features . 69 4.3. Detection of Rule-Derived Features . 76 4.4. Integration of Rule-Derived Features . 86 4.5. Acquisition of Admissible Distance Features . 100 4.6. Summary . 111 5. General Evaluation 115 5.1. Final Version of Nexplayer . 115 5.2. Past Competitions . 117 5.3. Experiments . 119 5.4. Summary . 121 6. Related Work 126 6.1. Probabilistic Agents . 126 6.2. Deterministic Agents . 128 6.3. Summary of GGP Agents . 133 vi 6.4. Non-GGP Approaches . 134 7. Discussion 136 7.1. Summary . 136 7.2. Future Work . 137 7.3. Publications . 139 A. Appendix 140 A.1. Evaluation Setup . 140 A.2. Other Improvements . 141 B. Bibliography 145 vii 1. Introduction Much like the ultimate goal of the drivers of industrialization was to relieve humanity of some of the hardships imposed by practical labor, research in the area of Artificial Intelligence aims to facilitate and support mental processes performed by humans. Following a behaviorist perspective, intelligence is a trait assigned to an agent by an observer based on the evidence of the agent’s (externally observable) behavior [Ski53]. In the case of non-living agents, such as agent programs, behavior is limited to actions and generally preceded by decisions. The goal of creating Artificial Intelligence can consequently be reduced to making intelligent decisions. An integral part of an intelligent decision is to consider the consequences, that is, derive the consequences of each possible action and compare them against each other. However, the consequences are often incomparable and thus need to be mapped to a common domain. Such mappings are typically represented by evaluation functions. Thus, by evaluating the consequences of actions and comparing the results, we can arrive at informed decisions. A well-defined and observable yet complex domain for evaluation functions are games. Here, agents have the goal of “winning the game” by selecting in each state which move to make. 1.1. Evaluation Functions In order to decide for or against a move, agents need to evaluate the consequences of their moves. For this purpose they employ evaluation functions that return information as to how positive a move should be regarded. Evaluation functions thus determine to a large part the behavior of an agent and will therefore be in the focus of this work. There are, however, other reasons to study evaluation functions in games. Most importantly, evaluation functions are much more general than their use in game playing agents suggests: Given that they estimate the value of abstract entities based on currently available evidence, they can be seen as predictor. As we will argue throughout this work, evaluation functions for game playing agents are evaluations of predictions for terminal states based on the current state. As such, they represent state value predictors based on current state, comparable to stock price predictors given the current stock market data or a meteorological forecasts based on today’s weather. The chaotic dynamics in game playing stem from other player’s moves and the problem of controlling a variable (the agent’s long-term reward) based on weakly related short-term evidence (the current state). 1 Still, game playing is easier than the two aforementioned prediction problems for three reasons. First, there is an explicit domain theory available that can be used, avoiding thereby an imprecise description of the problem. Second, the domain theory is known to be complete, theoretically eliminating the necessity for empiric evidence. And third, states in the games we investigate are Markovian, meaning that states prior to the current state can be disregarded for predicting future states. We argue that perspectives, ideas and approaches for addressing state value prediction will also provide insights for the general class of prediction problems. 1.2. Contributions As domain of application we will use General Game Playing (GGP) which is an ab- straction of traditional game playing where the game rules are only known at run time. Thus, a GGP agent cannot be adapted to the game by its programmer but must derive a strategy for each game on its own. There are two types of evaluation functions employed by GGP agents, probabilistic and deterministic evaluation functions. We will argue in favor of deterministic evaluation functions and analyze its construction along its two components: Features are the basic elements of evaluation function and evaluate specific aspects of a state. The Aggregation Function then works on top of these features and aggregates the fea- ture values to produce a single value. The composition of these functions is the evaluation function. The goal of this work is to provide an extensive discussion on how to construct an evaluation function. We aim to achieve this goal by discussing each component of the evaluation function with the following plan in mind: State of the Art We analyze the establishments of previous work. Categorization We categorize existing work to allow for a brief but comprehensive discussion. Theoretical Evaluation We evaluate the categories along specific guidelines to see what approach is best for constructing the component. Construction We construct the component and evaluate it. Improvement We discuss important improvements to our construction approach and evaluate it again. Our intention is to add value to the scientific discussion by motivating each step based on the findings in the discussion of its predecessor. A consequence of this structured approach is that we will be able to discover interesting relationships already while categorizing and evaluating a subject theoretically. These 2 relationships are quite different from the conclusions drawn in GGP research dominated by practical evaluations, such as: • a set of theoretical criteria that an evaluation function should fulfill in order pro- mote the playing strength of its agent, • theoretical conclusions on what types and components of evaluation functions are best-suited, motivated by the application of the above criteria, • a view on features that answers the simple yet unanswered question as to why they are good and what do they represent, • an interpretation that relates probabilistic (Monte Carlo-based) evaluation functions to deterministic features and describes, how both represent a slightly different solution to the same problem and how they can be combined. Nevertheless, the focus of this work is on practical matters. Based on the above findings, our contributions are: Construction of an Aggregation Function We propose a way to construct an aggregation function based on neural networks.

Load more