NOTE Communicated by Terrence Sejnowski

Neurogammon Wins Computer

Gerald Tesauro IBM Thomas 1. Watson Researcli Ceiiter, P.O. Box 704, Yorktom Hrights, NY 10598 USA

Neurogammon 1.0 is a program which uses multilayer

neural networks to make move decisions and doubling decisions. The Downloaded from http://direct.mit.edu/neco/article-pdf/1/3/321/811855/neco.1989.1.3.321.pdf by guest on 24 September 2021 networks learned to play backgammon by backpropagation training on expert data sets. At the recently held First Computer Olympiad in London, Neurogammon won the backgammon competition with a perfect record of five wins and no losses, thereby becoming the first learning program ever to win a tournament.

Neural network learning procedures are being widely investigated for many classes of practical applications. Board games such as , go, and backgammon provide a fertile testing ground because performance measures are clear and well defined. Furthermore, expert-level play can be of tremendous complexity. Learning programs have been studied in games environments for many years, but heretofore have not reached significant levels of performance. Neurogammon 1.O represents the culmination of previous research in backgammon learning networks (Tesauro and Sejnowski 1989; Tesauro 1988; Tesauro 1989) in the form of a fully functioning program. Neu- rogammon contains one network which makes doubling cube decisions and a set of six networks which make move decisions in different phases of the game. Each network has a standard fully-connected feed forward architecture with a single hidden layer, and was trained by the weil- known backpropagation algorithm (Rumelhart et al. 1986). The move- making networks were trained on a set of positions from 400 games in which the author played both sides. A "comparison paradigm," de- scribed in (Tesauro 1989), was used to teach the networks that the move selected by the expert should score higher than each of the other pos- sible legal moves. The doubling network was trained on a separate set of about 3000 positions which were classified according to a crude nine- point ranking scale of doubling strength. The training of each network proceeded until maximum generalization performance was obtained, as measured by performance on a set of test positions not used in training. The resulting program appears to play at a substantially higher level than conventional backgammon programs. At the Computer Olympiad in London, held on August 9-15, 1989, and organized by David Levy,

Neirml Coinpirtntioii 1, 321-323 (1989) @ 1989 Massachusetts Institute of Technology 322 Gerald Tesauro

Neurogammon competed against five other opponents: three commercial programs (Video Gammon/USA, Mephisto Backgammon/ W. Germany, and Saitek Backgammon/) and two non-commercial pro- grams (Backbrain/Sweden and A1 Backgammon/USA). Hans Berliner’s BKG program was not entered in the competition. In matches to 11 points, Neurogammon defeated Video Gammon by 12-7, Mephisto by 12-5, Saitek by 12-9, Backbrain by 114, and A1 Backgammon by 16-1, to take the gold medal in the backgammon competition. Also, in un- official matches to 15 points against two other commercial programs, Downloaded from http://direct.mit.edu/neco/article-pdf/1/3/321/811855/neco.1989.1.3.321.pdf by guest on 24 September 2021 Fidelity Backgammon Challenger and Sun Microsystems’ Gammontool, Neurogammon won by scores of 16-3 and 15-8 respectively. There were also a number of unofficial matches against intermediate-level humans at the Olympiad. Neurogammon won three of these and lost one. Fi- nally, in an official challenge match on the last day of the Olympiad, Neurogammon put up a good fight but lost to a human expert, Ossi Weiner of West Germany, by a score of 2-7. Weiner said that he was surprised at how much the program plays like a human, how rarely it makes mistakes, and that he had to play extremely carefully in order to beat it. In summary, Neurogammon’s victory at the Computer Olympiad demonstrates, along with similar recent advances in fields such as speech recognition (Lippmann 1989) and optical character recognition (Le Cun et al. in press), that neural networks can be practical learning devices for tackling hard computational tasks. It also suggests that machine learning procedures of this type might be useful in other games. However, there is still much work to be done both in extracting additional information from the data sets within the existing approach, as well as in developing new approaches such as unsupervised learning based on outcome, which would supplement what can be achieved with supervised learning from expert data.

References

Tesauro, G., and Sejnowski, T.J. 1989. A parallel network that learns to play backgammon. Avfificinl Intellig~nce39, 357-390. Tesauro, G. 1988. Neural network defeats creator in backgammon match. Tech. Rep. no. CCSR-88-6, Center for Complex Systems Research, University of Illinois at Urbana-Champaign. Tesauro, G. 1989. Connectionist learning of expert preferences by comparison training. In D. Touretzky, (Ed.), Adzlatices it7 Nr.urd Iiifortnation Processiq Systems, 99-106. Morgan Kaufman Publishers. Rumelhart, D.E., et al. 1986. Learning representations by backpropagating errors. Nntm 323, 533-536. Neurogammon Wins Computer Olympiad 323

Lippmann, R.P. 1989. Review of neural networks for speech recognition. Neiirnl Cotnp 1, 1-38. LeCun, Y., Boser, B., Denker, J.S., Hendersen, D., Howard, R.E., Hubbard, W., and Jackel, L.D. (in press). Backpropagation applied to handwritten zip code recognition. Ntwd Compiitatioii.

Received 30 August 1989; accepted 30 August 1989. Downloaded from http://direct.mit.edu/neco/article-pdf/1/3/321/811855/neco.1989.1.3.321.pdf by guest on 24 September 2021