Knowledge-Based General Game Playing
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge-Based General Game Playing Dissertation zur Erlangung des akademischen Grades Doktor rerum naturalium (Dr. rer. nat.) vorgelegt an der Technischen Universitat¨ Dresden Fakultat¨ Informatik eingereicht von Dipl.-Inf. Stephan Schiffel geboren am 15. 09. 1980 in Sebnitz Gutachter: Prof. Michael Thielscher, University of New South Wales Prof. Yngvi Bj¨ornsson, H´ask´olinn´ıReykjav´ık Datum der Verteidigung: 29.07.2011 Abstract General Game Playing (GGP) is concerned with the development of systems that can play well an arbitrary game solely by being given the rules of the game. This problem is considerably harder than traditional artificial intelligence (AI) game playing. Writing a player for a particular game allows to focus on the design of elaborate strategies and libraries that are specific to this game. Systems able to play arbitrary, previously unknown games cannot be given game-specific knowledge. They rather need to be endowed with high-level cognitive abilities such as general strategic thinking and abstract reasoning. This makes General Game Playing a good example of a challenge problem, which encompasses a variety of AI research areas including knowledge representation and reasoning, heuristic search, planning, and learning. Two fundamentally different approaches to GGP exist today, we call them knowledge-free and knowledge-based approach. The knowledge-free approach considers the game rules as a black box used for generating legal moves and successor states of the game. Players using this approach run Monte-Carlo simulations of the game to estimate the value of states of the game. On the other hand, the knowledge-based approach considers the components of a state (e. g., positions of pieces or tokens on a board) and, based on this information, evaluates the potential of a state to lead to a winning state for the player. This requires to automatically derive knowledge about the game that can be used for a meaningful evaluation. In this thesis, we present a knowledge-based approach to GGP that is im- plemented in our general game playing system Fluxplayer. Fluxplayer has participated in all the international general game playing competitions since 2005. It won the competition in 2006 and has scored top ranks since then. Flux- player uses heuristic search with an automatically generated state evaluation function and applies several techniques for constructing search heuristics by the automated analysis of the game. Our main contributions are • the automatic construction of heuristic state evaluation functions, • the automated discovery of game structures, • an automated system for proving properties of games, and • detecting symmetries in general games and exploiting them. iii Acknowledgments Over the last few years, many people helped to make this thesis a reality. I want to thank my adviser Michael Thielscher for introducing me to this wonderful topic. I am grateful for his constant reminder to publish the results and his invaluable support at writing papers and proof-reading them, even when they were only finished at the last minute. The people at TU-Dresden and especially my colleagues of the Computational Logic group made the last years a very enjoyable time. There was always someone to discuss ideas or problems and give helpful feedback for papers and presentations. Special thanks go to Conrad for many insightful discussions, some of which were even related to this work. Daniel who was always there for discussing new ideas at length and writing papers at night, and Sebastian whose ideas and suggestions on earlier versions of the thesis considerably helped to improve it. Finally, I am indebted to my wife Ute for bearing with me all this time. Thank you for urging me to finish this thesis and for struggling through all those unintuitive formulas while proof-reading. v Contents Contents vii 1. Introduction1 1.1. The GGP Problem..........................2 1.2. Contributions.............................3 2. Preliminaries5 2.1. Games.................................5 2.1.1. Deterministic Actions....................7 2.1.2. Complete Information....................7 2.1.3. Finiteness...........................7 2.2. Game Description Language (GDL)................8 2.2.1. General GDL Syntax.....................8 2.2.2. GDL Keywords........................9 2.2.3. GDL Restrictions....................... 11 2.3. Semantics of the GDL........................ 13 2.4. Fuzzy Logic.............................. 15 2.5. Answer Set Programs........................ 17 2.6. Summary............................... 18 3. Components of Fluxplayer 21 3.1. Overview............................... 21 3.2. Communication............................ 22 3.3. Reasoning............................... 24 3.3.1. Objective of Reasoning.................... 25 3.3.2. Implementation........................ 25 3.4. Strategy................................ 28 3.5. Summary............................... 28 4. Game Tree Search 29 4.1. Game Tree Search In General.................... 29 4.2. Heuristic Search........................... 31 4.2.1. Turn-taking N-Player Games................ 32 4.2.2. Turn-taking, Constant-sum, Two-player Games...... 33 4.2.3. General N-player Games................... 34 4.3. Monte-Carlo Tree Search (MCTS)................. 36 vii viii Contents 4.4. Search Method of Fluxplayer.................... 36 4.5. Summary............................... 38 5. Generating State Evaluation Functions 41 5.1. The Basic State Evaluation Function................ 42 5.1.1. Evaluation of Formulas................... 43 5.1.2. The Value of Parameter p .................. 46 5.1.3. The Choice of T-norm and T-Conorm........... 46 5.1.4. Theoretical Properties of the Fuzzy Evaluation...... 49 5.1.5. State Evaluation Function.................. 49 5.2. Identifying Structures in the Game................. 51 5.2.1. Domains of Relations and Functions............ 52 5.2.2. Static Structures....................... 54 5.2.3. Dynamic Structures..................... 55 5.3. Using Identified Structures for the Heuristics........... 59 5.3.1. Order Relations........................ 60 5.3.2. Quantities........................... 61 5.3.3. Game Boards......................... 63 5.4. Evaluation............................... 64 5.5. Summary............................... 68 6. Distance Estimates for Fluents and States 69 6.1. Motivating Example......................... 69 6.2. Distance Estimates Derived from Fluent Graphs......... 70 6.2.1. Fluent Graphs........................ 70 6.2.2. Distance Estimates...................... 72 6.3. Constructing Fluent Graphs from Rules.............. 74 6.3.1. Constructing φ ........................ 76 6.3.2. Selecting Preconditions for the Fluent Graph....... 80 6.3.3. Overview of the Complete Method............. 81 6.4. Examples............................... 81 6.4.1. Tic-Tac-Toe.......................... 81 6.4.2. Breakthrough......................... 83 6.5. Evaluation............................... 84 6.6. Future Work............................. 87 6.7. Summary............................... 88 7. Proving Properties of Games 89 7.1. Proving Game Properties Using Answer Set Programming.... 90 7.2. The General Proof Method and its Correctness.......... 92 7.3. An Automated Theorem Prover................... 96 7.4. Optimizations............................. 97 7.4.1. Reducing the Size of Generated Answer Set Programs.. 98 Contents ix 7.4.2. Improved Domain Calculation................ 100 7.4.3. Order of Proofs........................ 104 7.4.4. Reducing the Number of Properties to Prove....... 105 7.5. Experimental Results......................... 105 7.6. Summary and Outlook........................ 107 8. Symmetry Detection 109 8.1. Games and Symmetries....................... 109 8.2. Rule Graphs.............................. 111 8.3. Theoretic Results........................... 117 8.4. Exploiting Symmetries........................ 119 8.5. Discussion............................... 123 9. Related Work 125 9.1. Search Algorithms.......................... 125 9.1.1. Single-Player Games..................... 125 9.1.2. Multi-player Games with Heuristic Search......... 127 9.1.3. Multi-player Games with Monte-Carlo Search....... 128 9.2. Heuristics............................... 128 9.2.1. Heuristics in Other GGP Systems............. 129 9.2.2. Feature Construction and Evaluation............ 133 9.2.3. Heuristics in Automated Planning............. 135 9.3. Summary............................... 135 10.Discussion 137 10.1. Contributions............................. 137 10.2. Future Work............................. 139 10.3. Publications.............................. 140 A. The Rules of Breakthrough 141 Bibliography 145 1. Introduction Computer systems are used nowadays in many different environments, and new applications are developed every day. In more and more of them computers are not just executing a fixed sequence of tasks but have to take decisions based on their perceptions and a goal they have to achieve. Examples for such systems: • auto pilots in air planes, • automated parking systems in cars, but also • automated trading systems in financial markets. All these systems have in common that they have an overall goal to achieve. For example, flying or driving to a destination without crashing or making high profits with minimum risk. Furthermore, the environment of these systems and the actors in it behave according to certain rules. Some of these rules might change. For example, if the autopilot is deployed on a new model