Learning to Play the Game of Go
Total Page:16
File Type:pdf, Size:1020Kb
Learning to Play the Game of Go James Foulds October 17, 2006 Abstract The problem of creating a successful artificial intelligence game playing program for the game of Go represents an important milestone in the history of computer science, and provides an interesting domain for the development of both new and existing problem-solving methods. In particular, the problem of Go can be used as a benchmark for machine learning techniques. Most commercial Go playing programs use rule-based expert systems, re- lying heavily on manually entered domain knowledge. Due to the complexity of strategy possible in the game, these programs can only play at an amateur level of skill. A more recent approach is to apply machine learning to the prob- lem. Machine learning-based Go playing systems are currently weaker than the rule-based programs, but this is still an active area of research. This project compares the performance of an extensive set of supervised machine learning algorithms in the context of learning from a set of features generated from the common fate graph – a graph representation of a Go playing board. The method is applied to a collection of life-and-death problems and to 9 × 9 games, using a variety of learning algorithms. A comparative study is performed to determine the effectiveness of each learning algorithm in this context. Contents 1 Introduction 4 2 Background 4 2.1 Go................................... 4 2.1.1 DescriptionoftheGame. 5 2.1.2 TheHistoryofGo ...................... 6 2.1.3 Elementary Strategy . 7 2.1.4 Player Rankings and Handicaps . 7 2.1.5 Tsumego ........................... 8 2.2 Artificial Intelligence . 8 2.3 Supervised Learning . 9 2.4 Artificial Intelligence for Game Playing . 9 2.5 Artificial Intelligence for Go . 11 2.5.1 The Problem with Minimax . 11 2.5.2 Other Challenges . 12 2.5.3 Board Evaluation Functions vs Move Evaluation Functions 12 2.5.4 Current Go Programs – the State of the Art . 13 2.5.5 Pattern Matching . 13 2.5.6 Localized Goal-Directed Search . 14 2.5.7 MonteCarloMethods . 14 2.5.8 Machine Learning Approaches . 15 3 A Supervised Learning Approach to Go 16 3.1 Common Fate Graph Representation . 16 3.2 Relative Subgraph Features . 18 3.3 TsumegoProblems.......................... 19 3.4 PlayingGo .............................. 19 4 An Implementation of the Relative Subgraph Feature Extrac- tion Algorithm 20 4.1 Implementation Issues . 20 4.2 Building Common Fate Graphs . 20 4.3 Extracting Relative Subgraph Features . 20 4.4 Enumerating Subgraphs . 21 5 Tsumego Experiment 25 5.1 Experimental Setup . 25 5.2 Initial Parameter Tuning . 26 5.3 ExperimentalResults......................... 30 1 5.4 AnalysisofResults .......................... 31 5.5 Further Parameter Tuning . 33 5.5.1 Support Vector Machine using an RBF Kernel . 33 5.5.2 Support Vector Machine using a Polynomial Kernel . 35 6 9 × 9 Go Experiment 36 6.1 Experimental Setup . 37 6.1.1 TrainingSets ......................... 37 6.1.2 Program Architecture . 37 6.1.3 GameSetup.......................... 38 6.1.4 Evaluation . 39 6.2 ExperimentalResults......................... 39 6.3 Observations of Game Play . 42 7 Conclusions and Future Work 44 A Description of Machine Learning Algorithms 48 B Dictionary of Go Terms 50 C Design of the System 51 2 List of Figures 1 Example board position, and corresponding CFG . 17 2 Example relative subgraph feature vector . 18 3 Parameter selection for Random Forest . 27 4 Parameter selection for Bagging . 27 5 Parameter selection for AdaBoost.M1 with Decision Stump as the base classifier . 28 6 Parameter selection for AdaBoost.M1 with J48 (C4.5) as the base classifier................................ 28 7 Parameter selection for Nearest Neighbour classifier . 29 8 Parameter selection for Voted Perceptron . 29 9 Parameter tuning for the support vector machine with the RBF kernel,3DView............................ 34 10 Parameter tuning for the support vector machine with the RBF kernel, Side View 1 . 34 11 Parameter tuning for the support vector machine with the RBF kernel, Side View 2 . 35 12 Parameter tuning for the support vector machine using a poly- nomialkernel ............................. 36 List of Tables 1 Results of tsumego experiment . 31 2 Result of McNemar’s test for all pairs of algorithms . 32 3 Number of the program’s stones surviving at the end of the game. Median over 25 games for each classifier. 40 4 Mean number of the program’s stones on the board throughout the game. Median over 25 games for each classifier. 40 5 Overall game score – Wally’s score minus the machine learning player’s score. Median over 25 games for each classifier. 41 6 Total eyes created, summed over all 25 games for each classifier . 41 7 Number of games where groups with two eyes were formed . 42 List of Algorithms 1 Transform na¨ıve graph representation into common fate graph . 21 2 Count subgraph features . 21 3 1 Introduction The board game ‘Go’, also known as W´eiq´ı, or Baduk, is one of the oldest known games still played in its original form [32]. Ironically, despite its age and simplicity, Go is the last remaining major board game where the best artificial intelligence players perform only at the level of a weak amateur [29]. This report investigates the use of supervised machine learning techniques to learn a move evaluation function for Go, and presents a comparative study of the performance of various learning algorithms when applied to a particular approach to Go playing. The focus of this report is to extend the work done by Graepel et al. [21], who introduce a graph representation of a Go board position, and apply su- pervised machine learning techniques to a set of features generated from this representation. This report first provides an overview of the game of Go, including the his- tory and rules of the game and a survey of existing approaches to artificial intelligence Go playing. Relevant material from [21] is summarized for conve- nience. The results of a comparative study of supervised learning methods are then presented. The study evaluates the performance of a set of machine learning algorithms when learning a move evaluation function for Go, using the methods described in [21]. The performance of each algorithm is measured in terms of ability to choose moves that solve life-and-death problems, and score against an automated benchmark opponent in games of Go on a 9 × 9 playing board. 2 Background In this section, the nature of the game of Go is described, including the rules of the game, the history, and the rudiments of game-playing strategy. Go puzzles (tsumego) are used as a benchmark for supervised learning algorithms in this project, therefore they are described briefly here. The methodology of super- vised learning is then summarized, and a brief review of artificial intelligence approaches for Go is presented. 2.1 Go Go is an ancient oriental strategy board game for two players. It can be consid- ered to be very elegant, in the sense that its rules are extremely simple to state, but give rise to incredibly complex strategic possibilities. The game is interesting from the point of view of artificial intelligence, as it 4 represents a task where skilled humans are currently far superior to machines. The creation of an artificial intelligence Go-playing machine capable of beating human experts would mark an important milestone in the history of computer science, as Go is the last remaining popular strategy board game for which this has not been achieved. 2.1.1 Description of the Game Go is classified as a deterministic, zero-sum, perfect information, two player strategy game [11]. This means that the outcome of the game is entirely deter- mined by the moves made by the players, and a victory for one player necessarily equates to a loss for his or her opponent. Complete information about the cur- rent state of the game is available to both players at all times. The players, labeled Black and White, take turns to place stones of their own colour on a 19 × 19 board (although beginners may practice with smaller 9 × 9 or 13 × 13 boards) called the goban. Stones are placed on the intersections of the grid lines, and cannot be moved once they are placed, only removed from the board when they are captured. Black always plays first. Players may also pass. There is an additional rule, called ko, which disallows the placing of a stone that repeats a previous board position. In some variants of the rules, only the last board position before the current move must not be repeated, but in others no previous board position encountered in the current game may be repeated. This has been dubbed super ko. Transitively adjacent sets of stones of the same colour are called groups. Synonyms include strings, blocks and worms. Each empty point adjacent to a group is called a liberty. After a player has placed a stone, any of her opponent’s groups with no liberties are considered to be captured, and are hence removed from the board. If any of her own groups have no liberties after the removal of captured opponent groups, these are removed also. The game ends when both players pass consecutively. Before calculating the score the players agree to remove dead groups from the board. These are groups that would inevitably be captured. If the status of a group is disputed, this is resolved by further play. In the Chinese version of the rules, each player’s score is calculated as the number of points of her colour on the board, plus the number of empty points enclosed by her colour.