Search Pathology in the Game of Go
Total Page:16
File Type:pdf, Size:1020Kb
Search Pathology in the Game of Go Markus Enzenberger Department of Computing Science University of Alberta, Edmonton AB, Canada [email protected] Abstract. It is known theoretically that deeper minimax searches are not always beneficial in game tree search. Search pathology occurs in models that use randomly generated heuristic evaluation functions, but usually does not happen in practice. Experimental data showing search pathology in real game-playing programs has been rare. This article ex- amines the evaluation function used in a program that plays the game of Go and shows that search pathology can be observed. While the decision quality increases on average with deeper searches, this is no longer true, when comparing odd and even search depths. 1 Introduction 1.1 Search Pathology Since the work by Nau [1] and Beal [2] it has been known that minimax search in game trees can degrade the quality of the backed-up heuristic evaluation. This effect was shown for randomly generated evaluation functions and is usually not observed in real game-playing programs. Further studies suggested that the absence of this pathology can be explained by the similarity of values in sibling nodes [3, 4], early terminal positions [5], or an optimistic evaluation function [6]. Recent work by Luˇstrek,Bratko, and Gams [7] provides an alternative ex- planation: the pathology disappears if a model with a real-valued evaluation function is used instead of a win/loss function, and the error of the evaluation is not prohibitively large. Interestingly, the simulations using the real-valued model still reveal a pathology: while the win/loss error now decreases on average with the search depth, it shows an odd-even effect. For example, it is larger at depth two than at depth one. The effect seems to become weaker for larger search depths. A similar observation was reported by Nau in [8]: the percentage of wins using minimax in a certain class of games was higher for odd depths than for neighboring even depths. The common factor in all these studies is that they use a randomly generated noisy evaluation function, which is applied to small artificial game trees. In real games, the situation is more complex. Game trees can have non-uniform branching factors and the values of nodes are not independent. Depending on the game, some positions will be easy to evaluate statically; some will be difficult. Search will mainly be helpful to reduce the error, if positions with a lower error in the evaluation can be reached within the depth of the search. So it can be expected that some games are more susceptible to search pathology than others. Experimental data produced by real game-playing programs has been rare so far. 1.2 Computer Go The game of Go has been a difficult field for computer programs over the last decades. Even the top programs are still far from reaching human master level. Misjudging the status of a group of stones introduces an error in the evaluation, which can both over- and underestimate the real value of a position by a large amount. Many Go programs do not use global search at all. They rely on local searches, move-evaluation, patterns, and rule-based systems for selecting a move. A new approach, which has become popular on 9×9 boards recently, avoids the diffi- culty of finding a good evaluation function by using Monte-Carlo simulations to estimate the value of a position [9]. However, there is no straight-forward way to combine the averaging backup operator of the Monte-Carlo simulation with the minimax backup operator of the game tree search. Apart from the endgame, Go game trees have a nearly uniform branching factor, because most empty points are legal moves for both sides. On a 9×9 board, the initial branching factor is 81; it decreases roughly by one for every move played. 1.3 NeuroGo NeuroGo [10] is a neural network based Go program that learns a real-valued po- sition evaluation function from playing games against itself. The architecture of the network is described in detail in [11]. It uses temporal difference learning for both the prediction of local connectivity and the prediction of the local reward. The evaluation function predicts the expected final score in the game. NeuroGo has participated in several Computer Go tournaments; its best result so far was winning a silver medal in the 9×9 Go tournament of the 8th Computer Olympiad in 2003. Compared with the version described in [11], the tournament version uses an extended set of input features; most notably it includes an influence- based static heuristic life and death estimator similar to [12], but only for cases without inner prisoner stones. The search algorithm used by NeuroGo is a nega-max implementation of the alpha-beta search algorithm with standard move ordering techniques and iterative deepening. All legal moves are generated, apart from points where the absolute local output value of the network is above a threshold of 0.9 independent of the color to move; those points are considered to be safe territory. 2 Experimental Setup 2.1 Goal The goal of the experiment was to examine statistical properties of NeuroGo’s evaluation function and to study the performance of NeuroGo depending on the search depth. The error of the evaluation function is unknown, but it can be assumed that a higher error will cause fewer wins and a lower average score. For the experiments, a fixed maximum depth was chosen and the complete search tree up to this depth was searched. A small amount of noise in the range of [0 ... 0.2] was added to the leaf nodes to increase the number of different games. 2.2 Game Playing Games were played against opponent programs supporting the Go Text Protocol [13] on a 9×9 board with Chinese rules and a komi of 6.5. The tool TwoGtp, included in the GoGui [14] package, allows automatic game-play and statistical analysis of the results. Since some of the opponent programs play highly deter- ministically, a set of 21 balanced four-move opening positions, which are included in GoGui version 0.8, was used. A total number of 200 games was played against each opponent program and for each search depth between one and four. The color to play for NeuroGo was alternated every second game. Duplicate games were excluded from the analysis. Since some of the programs cannot score games or frequently get the final score wrong, GNU Go 3.6 [15] was used for determin- ing the final score. The standard error of the mean score and the percentage of wins was determined by TwoGtp from the variance of the results. After the games were played, the tool GtpStatistics, which is also included in the GoGui package, was used to collect information about the evaluation function. GtpStatistics allows to iterate over all positions in a set of games, send a number of configurable commands to the Go program and evaluate the responses statistically for commands returning a numerical value. The experiment was performed on a computer with an Athlon XP 2800+ CPU and 512 MB memory. On this hardware, NeuroGo can evaluate about 300 positions per second on average, and finish a full 4-ply search in less than a minute. 2.3 Opponent Programs The following Go programs were used as opponents: – Aya, version 5.53, by Hiroshi Yamashita [16]. Aya won the gold medal in the 9×9 Go tournament at the 9th Computer Olympiad in 2003 and the silver medal at the 10th Computer Olympiad in 2005. It uses selective global search in its 9×9 version. – Crazy Stone, version 0001-19, by R´emiCoulom [17]. Crazy Stone uses the average outcome of Monte Carlo simulations for evaluating a position. This evaluation is combined with global minimax search using a new approach that addresses the difficulties of using minimax and average backup opera- tions in Monte Carlo Go programs. – Explorer, version 6.6.x/Nov 3 2005, by Martin M¨uller [18]. Explorer is a veteran on the Computer Go scene. It won the 19×19 Go tournament at the first Computer Olympiad in 1989. It uses pattern databases, local searches, move evaluation, and other techniques, but no global search. – GNU Go, version 3.6, by the GNU Go team [15]. GNU Go won the 19×19 tournament at the 10th Computer Olympiad 2003 with no lost games. It uses pattern databases, local searches, influence functions, move generators, and other techniques, but no global search. To study the performance gain of deeper search when playing against itself, NeuroGo was used as an additional opponent playing with a fixed search depth of one. 3 Results 3.1 Statistical Properties Some characteristic statistical properties of the evaluation function were studied for all the positions in all games played. Even if the distribution of evaluations, true values, and errors in the search trees is not known, it is interesting to com- pare these properties with the assumptions that were made in the mathematical models for studying search patholgy. For positions in which NeuroGo was to move, the positions correspond to the root nodes of the searches, but positions in which the opponents were to move were also included. The evaluation function of NeuroGo from Black’s perspective was used unless mentioned otherwise; positive numbers are good for Black. Neu- roGo automatically adds the komi to the estimated score. The move numbers do not include the four opening moves which were enforced during the games; move number one corresponds to the first position after the opening moves.