Search Pathology in the Game of Go

Markus Enzenberger

Department of Computing Science University of Alberta, Edmonton AB, Canada [email protected]

Abstract. It is known theoretically that deeper minimax searches are not always beneficial in game tree search. Search pathology occurs in models that use randomly generated heuristic evaluation functions, but usually does not happen in practice. Experimental data showing search pathology in real game-playing programs has been rare. This article ex- amines the evaluation function used in a program that plays the game of Go and shows that search pathology can be observed. While the decision quality increases on average with deeper searches, this is no longer true, when comparing odd and even search depths.

1 Introduction

1.1 Search Pathology

Since the work by Nau [1] and Beal [2] it has been known that minimax search in game trees can degrade the quality of the backed-up heuristic evaluation. This effect was shown for randomly generated evaluation functions and is usually not observed in real game-playing programs. Further studies suggested that the absence of this pathology can be explained by the similarity of values in sibling nodes [3, 4], early terminal positions [5], or an optimistic evaluation function [6]. Recent work by Luˇstrek,Bratko, and Gams [7] provides an alternative ex- planation: the pathology disappears if a model with a real-valued evaluation function is used instead of a win/loss function, and the error of the evaluation is not prohibitively large. Interestingly, the simulations using the real-valued model still reveal a pathology: while the win/loss error now decreases on average with the search depth, it shows an odd-even effect. For example, it is larger at depth two than at depth one. The effect seems to become weaker for larger search depths. A similar observation was reported by Nau in [8]: the percentage of wins using minimax in a certain class of games was higher for odd depths than for neighboring even depths. The common factor in all these studies is that they use a randomly generated noisy evaluation function, which is applied to small artificial game trees. In real games, the situation is more complex. Game trees can have non-uniform branching factors and the values of nodes are not independent. Depending on the game, some positions will be easy to evaluate statically; some will be difficult. Search will mainly be helpful to reduce the error, if positions with a lower error in the evaluation can be reached within the depth of the search. So it can be expected that some games are more susceptible to search pathology than others. Experimental data produced by real game-playing programs has been rare so far.

1.2

The game of Go has been a difficult field for computer programs over the last decades. Even the top programs are still far from reaching human master level. Misjudging the status of a group of stones introduces an error in the evaluation, which can both over- and underestimate the real value of a position by a large amount. Many Go programs do not use global search at all. They rely on local searches, move-evaluation, patterns, and rule-based systems for selecting a move. A new approach, which has become popular on 9×9 boards recently, avoids the diffi- culty of finding a good evaluation function by using Monte-Carlo simulations to estimate the value of a position [9]. However, there is no straight-forward way to combine the averaging backup operator of the Monte-Carlo simulation with the minimax backup operator of the game tree search. Apart from the endgame, Go game trees have a nearly uniform branching factor, because most empty points are legal moves for both sides. On a 9×9 board, the initial branching factor is 81; it decreases roughly by one for every move played.

1.3 NeuroGo

NeuroGo [10] is a neural network based Go program that learns a real-valued po- sition evaluation function from playing games against itself. The architecture of the network is described in detail in [11]. It uses temporal difference learning for both the prediction of local connectivity and the prediction of the local reward. The evaluation function predicts the expected final score in the game. NeuroGo has participated in several Computer Go tournaments; its best result so far was winning a silver medal in the 9×9 Go tournament of the 8th Computer Olympiad in 2003. Compared with the version described in [11], the tournament version uses an extended set of input features; most notably it includes an influence- based static heuristic life and death estimator similar to [12], but only for cases without inner prisoner stones. The search algorithm used by NeuroGo is a nega-max implementation of the alpha-beta search algorithm with standard move ordering techniques and iterative deepening. All legal moves are generated, apart from points where the absolute local output value of the network is above a threshold of 0.9 independent of the color to move; those points are considered to be safe territory. 2 Experimental Setup

2.1 Goal

The goal of the experiment was to examine statistical properties of NeuroGo’s evaluation function and to study the performance of NeuroGo depending on the search depth. The error of the evaluation function is unknown, but it can be assumed that a higher error will cause fewer wins and a lower average score. For the experiments, a fixed maximum depth was chosen and the complete search tree up to this depth was searched. A small amount of noise in the range of [0 ... 0.2] was added to the leaf nodes to increase the number of different games.

2.2 Game Playing

Games were played against opponent programs supporting the Go Text Protocol [13] on a 9×9 board with Chinese rules and a komi of 6.5. The tool TwoGtp, included in the GoGui [14] package, allows automatic game-play and statistical analysis of the results. Since some of the opponent programs play highly deter- ministically, a set of 21 balanced four-move opening positions, which are included in GoGui version 0.8, was used. A total number of 200 games was played against each opponent program and for each search depth between one and four. The color to play for NeuroGo was alternated every second game. Duplicate games were excluded from the analysis. Since some of the programs cannot score games or frequently get the final score wrong, GNU Go 3.6 [15] was used for determin- ing the final score. The standard error of the mean score and the percentage of wins was determined by TwoGtp from the variance of the results. After the games were played, the tool GtpStatistics, which is also included in the GoGui package, was used to collect information about the evaluation function. GtpStatistics allows to iterate over all positions in a set of games, send a number of configurable commands to the Go program and evaluate the responses statistically for commands returning a numerical value. The experiment was performed on a computer with an Athlon XP 2800+ CPU and 512 MB memory. On this hardware, NeuroGo can evaluate about 300 positions per second on average, and finish a full 4-ply search in less than a minute.

2.3 Opponent Programs

The following Go programs were used as opponents:

– Aya, version 5.53, by Hiroshi Yamashita [16]. Aya won the medal in the 9×9 Go tournament at the 9th Computer Olympiad in 2003 and the silver medal at the 10th Computer Olympiad in 2005. It uses selective global search in its 9×9 version. – Crazy Stone, version 0001-19, by ´emiCoulom [17]. Crazy Stone uses the average outcome of Monte Carlo simulations for evaluating a position. This evaluation is combined with global minimax search using a new approach that addresses the difficulties of using minimax and average backup opera- tions in Monte Carlo Go programs. – Explorer, version 6.6.x/Nov 3 2005, by Martin M¨uller [18]. Explorer is a veteran on the Computer Go scene. It won the 19×19 Go tournament at the first Computer Olympiad in 1989. It uses pattern databases, local searches, move evaluation, and other techniques, but no global search. – GNU Go, version 3.6, by the GNU Go team [15]. GNU Go won the 19×19 tournament at the 10th Computer Olympiad 2003 with no lost games. It uses pattern databases, local searches, influence functions, move generators, and other techniques, but no global search.

To study the performance gain of deeper search when playing against itself, NeuroGo was used as an additional opponent playing with a fixed search depth of one.

3 Results

3.1 Statistical Properties Some characteristic statistical properties of the evaluation function were studied for all the positions in all games played. Even if the distribution of evaluations, true values, and errors in the search trees is not known, it is interesting to com- pare these properties with the assumptions that were made in the mathematical models for studying search patholgy. For positions in which NeuroGo was to move, the positions correspond to the root nodes of the searches, but positions in which the opponents were to move were also included. The evaluation function of NeuroGo from Black’s perspective was used unless mentioned otherwise; positive numbers are good for Black. Neu- roGo automatically adds the komi to the estimated score. The move numbers do not include the four opening moves which were enforced during the games; move number one corresponds to the first position after the opening moves. Black is to move at odd move numbers; White at even numbers. Figure 1 shows a typical development of the evaluation function during a game. It contains stable periods, periods with changes over a a long sequence of moves, and a period with fast changes. There are also unstable periods where the evaluation oscillates over a sequence of moves. Figures 2 shows the distributions of the difference between the evaluation and the final score at move number 10, 11, 30, and 31. This is an approximation of the error in the evaluation function; the real error could only be known if the programs played optimally. The distributions show that the error can be described by Gaussian noise only early in the game. Later, there is a peak of positions with an error around zero; the height of the peak increases with the move number. The tails of the distribution go to both large positive and negative errors. Apart from a slight asymmetry, the distributions look similar for positions with Black and with White to move. Figure 3 shows the mean absolute difference to the final score depending on the move number. It decreases nearly linearly between 22 at move one and 13 at move 40. Apart from the first few moves, there is no significant dependency on the player to move. The mean absolute evaluation is shown in Figure 4. Around move number 5 the absolute evaluation increases almost linearly with the move number; the slope becomes slightly smaller after about move number 20. The variance is increasing with the move number. Figure 5 shows the average evaluation for the player to move. The evaluation function is on average optimistic for the player to move. The effect is low in the beginning, reaches a maximum of up to four points between move number 25 and 30, and then decreases again slightly towards the end of the game. It is larger in the games against other opponents than in the games with NeuroGo playing against itself. There is also a smaller effect, which favours positions with White to move and causes the zig-zag shape of the curve. This effect is about two points in the beginning and drops to less than one point after move number 15. The second effect is unexpected, because Black has the advantage of the first move, and should be favoured in early positions, so the komi overcompesates for the advantage of playing Black. The difference of the evaluation between subsequent moves is shown in Figure 6. The distributions have a peak at zero and a tail towards negative differences for positions with Black to move and a tail towards positive differences for White to move. These histograms show that there is a large number of positions where the evaluation function is relatively stable between two moves. However, when it is not stable, it is generally favour- ing the player to move and the jump in the evaluation can be large. Manual inspection of a part of the games played showed that these positions are mostly difficult positions with large groups involved in tactical fights which have unclear outcomes.

20 10 0 -10 -20 -30 -40

Evaluation -50 -60 -70 -80 -90 0 10 20 30 40 50 60 70 Move number

Fig. 1. Evaluation in a game NeuroGo against Aya. 300 300

200 200

Positions 100 Positions 100

0 0 -80 -60 -40 -20 0 20 40 60 80 -80 -60 -40 -20 0 20 40 60 80 Difference to final score at move 10 Difference to final score at move 11 300 300

200 200

Positions 100 Positions 100

0 0 -80 -60 -40 -20 0 20 40 60 80 -80 -60 -40 -20 0 20 40 60 80 Difference to final score at move 30 Difference to final score at move 31

Fig. 2. Difference of evaluation and final score.

20

15 Mean absolute error

10 0 5 10 15 20 25 30 35 40 Move number

Fig. 3. Mean absolute difference of evaluation and final score.

35 30 25 20 15 10 Absolute evaluation 5 0 0 5 10 15 20 25 30 35 40 Move number

Fig. 4. Mean absolute evaluation. The bars show the variance. 6 5 4 3 2 1

Mean evaluation 0 -1 NeuroGo vs. NeuroGo NeuroGo vs. other opponents -2 0 5 10 15 20 25 30 35 40 Move number

Fig. 5. Mean evaluation for the player to move.

300 300

200 200

Positions 100 Positions 100

0 0 -80 -60 -40 -20 0 20 40 60 80 -80 -60 -40 -20 0 20 40 60 80 Difference move 10 - 9 Difference move 11 - 10 300 300

200 200

Positions 100 Positions 100

0 0 -80 -60 -40 -20 0 20 40 60 80 -80 -60 -40 -20 0 20 40 60 80 Difference move 30 - 29 Difference at move 31 - 30

Fig. 6. Difference of evaluation between subsequent moves. 3.2 Performance

The percentage of wins and average score of the played games are shown in Table 1 and Figure 7. The results against Aya, Crazy Stone, and GNU Go show a pathology. Search depth two performs significantly worse than search depth one. The average score drops from -2.7 to -30.9 against Aya, from 5.8 to -13.4 against Crazy Stone, and from -5.7 to -14.1 against GNU Go. Search depth four performs worse than search depth three, but the effect is weaker here. The average score drops from 12.3 to 11.0 against Aya, from 18.1 to 14.8 against Crazy Stone, and from 10.9 to 6.0 against GNU Go. In the games against Explorer, the number of NeuroGo’s wins is high and the pathology is either absent or hidden by the statistical error. Interestingly, the pathology cannot be observed in the games of NeuroGo against itself; here the average score increases monotonically with the search depth from 0.0 to 26.7. A separate analysis of the games in which NeuroGo played Black or White showed the same effects against all opponents within the error margins.

Table 1. Game playing results.

Opponent Depth Wins/% Score Games Length Time/s Aya 1 46.1 (±5.7) −2.7 (±2.9) 76 64.2 0.1 2 21.3 (±4.7) −30.9 (±4.2) 75 79.3 0.7 3 65.8 (±5.4) 12.3 (±3.2) 76 62.9 8.3 4 70.0 (±5.5) 11.0 (±3.3) 70 71.8 42.8 Crazy Stone 1 70.1 (±3.3) 5.8 (±1.8) 194 64.6 0.1 2 39.8 (±3.5) −13.4 (±2.5) 191 76.1 0.6 3 83.8 (±2.6) 18.1 (±1.5) 197 67.4 7.1 4 75.0 (±3.1) 14.8 (±2.2) 200 79.8 34.1 Explorer 1 81.6 (±4.2) 25.4 (±3.4) 87 55.4 0.1 2 93.3 (±2.8) 21.8 (±2.4) 76 55.4 0.9 3 94.3 (±2.5) 23.7 (±2.2) 88 50.7 11.0 4 96.3 (±2.1) 32.5 (±2.7) 81 55.9 64.2 GNU Go 1 39.2 (±5.0) −5.7 (±2.5) 97 59.6 0.1 2 33.3 (±5.1) −14.1 (±3.4) 84 67.7 0.7 3 67.4 (±5.1) 10.9 (±2.7) 86 55.3 8.9 4 54.8 (±5.4) 6.0 (±3.0) 84 64.7 46.5 NeuroGo 1 50.4 (±4.7) 0.0 (±1.7) 113 50.3 0.1 2 70.5 (±4.5) 6.5 (±1.9) 105 60.3 0.7 3 75.9 (±4.1) 9.5 (±1.2) 108 51.9 9.6 4 93.2 (±2.5) 26.7 (±2.1) 103 59.5 47.4 40

30

20

10

0 Score

-10

-20 Aya -30 Crazy Stone Explorer GNU Go -40 NeuroGo 1 2 3 4 Depth 100

90

80

70

60

50 Wins / % 40

30 Aya 20 Crazy Stone Explorer 10 GNU Go NeuroGo 50 0 1 2 3 4 Depth

Fig. 7. Average score and percentage of wins. 4 Conclusion

The evaluation function of a medium-strength Go program was examined sta- tistically. The evaluation function was found to be optimistic for the player to move, especially in difficult positions with unclear outcome. The mean difference between the evaluation and the final score decreased linearly with the move num- ber and showed no significant dependence on the player to move. It resembles a Gaussian function only for early positions. It was shown that search pathology occured, and that it had a significant impact on the playing strength of the program at low search depths. Game play- ing experiments against a fixed-depth version of the same Go program are not sufficient to observe this effect. Go programs that use global search should there- fore experimentally determine the playing strength against different opponents at different search depths and take measures if needed. Such a measure could be to simply avoid even search depths and use iterative deepening with only odd depths. It could also be worthwile to investigate other backup rules than minimax as described in [19].

References

1. Nau, D.: Quality of decision versus depth of search on game trees. PhD thesis, Duke University (1979) 2. Beal, D.: An analysis of minimax. In Clarke, M., ed.: Advances in Computer Chess 2, Edinburgh University Press (1980) 103–109 3. Beal, D.: Benefits of minimax search. In Clarke, M., ed.: Advances in Computer Chess 3, Pergamon Press (1982) 17–24 4. Bratko, I., Gams, M.: Error analysis of the minimax principle. In Clarke, M., ed.: Advances in Computer Chess 3, Pergamon Press, Oxford, UK (1982) 1–15 5. Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley Publishing Company, Reading, MA (1084) 6. Schr¨ufer,G.: Presence and absence of pathology on game trees. In Beal, D., ed.: Advances in Computer Chess 4, Pergamon Press, Oxford, UK (1986) 101–112 7. Luˇstrek, M., Bratko, I., Gams, M.: Why minimax works: An alternative explana- tion. In: IJCAI conference. (2005) 212–217 8. Nau, D.: Experiments on alternatives to minimax. International Journal of Parallel Programming 15(2) (1986) 163–183 9. Bouzy, B., Helmstetter, B.: Monte Carlo Go developments. In van den Herik, H.J., Iida, H., Heinz, E.A., eds.: Advances in Computer Games 10, Kluwer (2003) 159–174 10. Enzenberger, M.: NeuroGo. http://www.markus-enzenberger.de/neurogo.html (2006) 11. Enzenberger, M.: Evaluation in Go by a neural network using soft segmentation. In van den Herik, H.J., Iida, H., Heinz, E.A., eds.: Advances in Computer Games 10, Kluwer (2003) 97–108 12. Chen, K., Chen, Z.: Static analysis of life and death in the game of Go. Information Sciences 121(1-2) (1999) 113–134 13. Farneb¨ack, G.: Go Text Protocol. http://www.lysator.liu.se/~gunnar/gtp/ (2006) 14. Enzenberger, M.: GoGui documentation. http://www.markus-enzenberger.de/ compgo/gogui/doc/ (2006) 15. The GNU Go team: GNU Go. http://www.gnu.org/software/gnugo/gnugo.html (2006) 16. Yamashita, H.: Hiroshi’s computer shogi and Go. http://www32.ocn.ne.jp/~yss/ (2006) 17. Coulom, R.: Crazy Stone. http://remi.coulom.free.fr/CrazyStone/ (2006) 18. M¨uller, M.: Explorer. http://www.cs.ualberta.ca/~mmueller/cgo/explorer. html (2006) 19. Junghanns, A.: Are there practical alternatives to alpha-beta? ICCA Journal 21(1) (1998) 14–32