Bayesian Learning of Generalized Board Positions for Improved Move Prediction in Computer Go

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Bayesian Learning of Generalized Board Positions for Improved Move Prediction in Computer Go Martin Michalowski and Mark Boddy and Mike Neilsen Adventium Labs, 111 3rd Ave S, Suite 100, Minneapolis MN 55401 USA {fi[email protected]} Abstract one of the corners of the board), Cortigo learns a set of distributions which can be shown to provide effective move pre- Computer Go presents a challenging problem for machine diction for both human and computer gameplay. learning agents. With the number of possible board states estimated to be larger than the number of hydrogen atoms in the It is important to note that by prediction, we do not mean universe, learning effective policies or board evaluation func- that there is a single best move that Cortigo will select for a tions is extremely difficult. In this paper we describe Cortigo, given board position. Even the very best human players will a system that efficiently and autonomously learns useful gen- disagree on the “best” move for a given position. However, eralizations for large state-space classification problems such it turns out that effective move prediction in the sense that as Go. Cortigo uses a hierarchical generative model loosely the top few moves predicted by the system include the move related to the human visual cortex to recognize Go board po- that was taken in the actual game results in a system that can sitions well enough to suggest promising next moves. We provide effective guidance, in the form of a move-ranking begin by briefly describing and providing motivation for re- heuristic for a Upper Confidence Intervals for Trees (UCT)- search in the computer Go domain. We describe Cortigo’s based Go player. ability to learn predictive models based on large subsets of the Go board and demonstrate how using Cortigo’s learned In this paper, we present results demonstrating that adding models as additive knowledge in a state-of-the-art computer Cortigo to the set of heuristics already used by the computer Go player (Fuego) significantly improves its playing strength. Go player Fuego results in a significant improvement in per- formance. We also show that learning these distributions is a process that converges based on small numbers of samples, 1 Introduction though both predictive accuracy and convergence suffer as This paper describes our work on Cortigo, a system that uses the number of stones on the board increases. learning in hierarchical Bayesian networks for move prediction in the game of Go. The extremely large number of pos- 2 Computer Go sible states on a Go board means that learning a policy based on exactly matching specific board positions is unlikely to The game of Go has a very simple set of rules, but is be helpful. An exhaustive policy is infeasible: the number nonetheless very challenging for both human and computer of possible states is estimated to be greater than the number players. Go differs from most other games for which com- of hydrogen atoms in the universe (Hsu 2007). A partial puter players have been implemented in that the best human policy can be much smaller, but empirical evidence demon- players are still significantly better than the best computer strates that even very large databases of Go positions are players. In contrast, for games such as chess, backgam- unlikely to provide exact matches past the first few moves of mon, and poker, computer players are acknowledged to be the game. Cortigo learns approximate board positions. Un- as strong as or stronger than the best human players. like previous work in the area (Wolf 2000; Silver et al. 2007; Go is played on a 19-by-19 board, with players taking turns placing either a black or a white stone on an unoccu- Dabney and McGovern 2007) that uses explicit game fea- 1 tures such as liberty counts, the presence or absence of walls pied point. Consequently, the number of possible moves is or ladders, Cortigo has no explicit model of board features very large for the first few moves of the game. The num- beyond which points are occupied by stones of what color. ber of “good” moves in the opening game is presumed to be Our hypothesis was that using hierarchical Bayesian net- significantly smaller, but is not known. While it is possible works would permit the system to learn implicit features, to construct an opening book for Go, there is no guarantee which would then result in good move prediction for board that other good opening moves are not waiting to be discov- positions that had not appeared in the training set. This has ered. Games can theoretically proceed until nearly the entire in fact been the case. Presented with a training set of partial the board is covered with stones, but in practice tend to run board positions (rectangles of varying sizes, oriented around roughly 100 moves, with a wide variance. Considered as a game-tree search, then, the game of Go is a huge problem: Copyright c 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1For full rules see: http://en.wikipedia.org/wiki/Go (game) 815 the tree depth is on the order of 100 and the branching factor is on the order of the size of the board for at least the first few moves. The number of states is immense and there are no obvious means to factor or partition them, beyond the obvious rotational and reflective symmetries. Recent approaches to computer Go have either treated it as a stochastic search problem, using a sampling-based heuristic search algorithm known as Upper Confidence In- tervals for Trees (UCT), or as a learning problem, or as some combination of the two (Bouzy and Cazenave 2001). Learning for computer Go has included attempting to learn policies via reinforcement learning (Silver et al. 2007), or learning classifiers for board positions, such that each board comes with a suggested next move (Stern, Herbrich, and Figure 1: Multilayer pyramidal Bayes net, from (Dean 2006) Graepel 2006). At this point, the strongest players are all variants on or refinements of UCT. One of the strengths of the UCT approach, and one of the reasons for the wide va- riety of competitive UCT-based Computer Go players (e.g., of an exact match on board position in even a large database Fuego, Crazy Stone, MoGo, Zen, etc.) is that the UCT algo- are very low. rithm presents several opportunities for heuristic approaches Consequently, we treat this as a generalization problem: and enhancements, including but not limited to the selection given a limited set of training samples consisting of a quad- of nodes for expansion and biasing the Monte Carlo search rant board position and the next move in that quadrant, learn used to evaluate those nodes. a function such that the move(s) suggested based on a position in a game not previously seen will include the move that was actually taken in that game. This problem is made 3 Cortigo more difficult by the fact that it is not obvious how to char- Cortigo provides a ranking on possible next moves for a acterize two board positions as being “close” to one another. given board position. Cortigo does this using move pre- Removing a single stone from the board, or moving a stone diction: mapping from a given board position to a predic- on the board a single point in any direction, can have dras- tion of the next move that would be made by an expert tic effects on the strength of the resulting positions for each Go player. The moves are ordered in decreasing likelihood player. that that move would be taken given the current board po- As mentioned earlier, previous work on computer Go sition, given the distributions that have been learned in a players has sometimes relied on defining specific features previously-trained hierarchical Bayes net, described below. thought to be relevant to this classification problem. In In many ways, this is an easy domain for this kind of classi- contrast, Cortigo starts with no assumptions about specific fication problem. The description of the current board posi- relevant features above the level of individual points on tion is simple in structure and small in size, consisting only the board. Cortigo deals with the problem of approximate of the current state of each point on the board. For a board matches on the board position and multiple possible moves position appearing in any given game, the output of the clas- for any given position by learning distributions, encoded as sifier is unambiguous: the next move taken. Finally, this is a set of weights within a hierarchical Bayesian network. As a domain where very extensive training and testing data is shown in Figure 1, this network has a very specific struc- available. Go has been played for hundreds of years in very ture, known as a Pyramidal Bayes Net (PBN), and is part close to its current form, and many of those games have been of a more general approach known as Hierarchical Gen- recorded in detail. erative Models (Dean 2005; 2006). Intended to be loosely If the problem were that straightforward, the easy solu- analogous to the visual cortex, these structures have been tion would be to buy a large game database,2 and match the shown by Dean to provide some of the same benefits in terms current board position to those appearing in saved games, of preserving recognition of objects and structures through resulting in a selection of suggested moves based on a long translation and scaling changes, and in the presence of noisy history of expert play.

Bayesian Learning of Generalized Board Positions for Improved Move Prediction in Computer Go

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support