Imperial College London Arxiv:1509.01549V1 [Cs.AI] 4 Sep
Total Page:16
File Type:pdf, Size:1020Kb
Imperial College London Department of Computing Giraffe: Using Deep Reinforcement Learning to Play Chess by Matthew Lai Submitted in partial fulfilment of the requirements for the MSc Degree in arXiv:1509.01549v1 [cs.AI] 4 Sep 2015 Advanced Computing of Imperial College London September 2015 Abstract This report presents Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the pro- grammer. Unlike previous attempts using machine learning only to perform parameter- tuning on hand-crafted evaluation functions, Giraffe’s learning system also performs automatic feature extraction and pattern recognition. The trained evaluation function performs comparably to the evaluation functions of state-of-the-art chess engines - all of which containing thousands of lines of carefully hand-crafted pattern recognizers, tuned over many years by both computer chess experts and human chess masters. Giraffe is the most successful attempt thus far at using end-to-end machine learning to play chess. We also investigated the possibility of using probability thresholds instead of depth to shape search trees. Depth-based searches form the backbone of virtually all chess engines in existence today, and is an algorithm that has become well-established over the past half century. Preliminary comparisons between a basic implementation of probability-based search and a basic implementation of depth-based search showed that our new probability-based approach performs moderately better than the established approach. There are also evidences suggesting that many successful ad-hoc add-ons to depth-based searches are generalized by switching to a probability-based search. We believe the probability-based search to be a more fundamentally correct way to perform minimax. Finally, we designed another machine learning system to shape search trees within the probability-based search framework. Given any position, this system estimates the probability of each of the moves being the best move without looking ahead. The system is highly effective - the actual best move is within the top 3 ranked moves 70% of the time, out of an average of approximately 35 legal moves from each position. This also resulted in a significant increase in playing strength. With the move evaluator guiding a probability-based search using the learned eval- uator, Giraffe plays at approximately the level of an FIDE International Master (top 2.2% of tournament chess players with an official rating)12. 1F´ed´erationInternationale des Echecs,´ or the World Chess Federation, is the international organisation that governs all major international chess competitions. 2Please see AppendixA for a description of the Elo rating system. 2 Acknowledgements This project would not have been possible without the guidance, encouragement, insights, and inspiration from my supervisor Professor Duncan Gillies. I would also like to thank the computer chess community for their support. In particular - Gerd Isenberg, for maintaining and writing most of the Chess Programming Wiki, the most comprehensive reference for everything to do with computer chess, and Graham Banks, for testing Giraffe against a wide variety of opponents. I am also thankful for the hundreds of people who played thousands of games against Giraffe on the Internet Chess Club, including Grandmaster Tahir Vakhidov, Interna- tional Master Alejandro Bofill Mas, IM Alejandro Montalvo, IM Alex Domont, IM Ul- rich Schulze, IM William Hartston, FIDE Master 'renium', 'solsjenitsyn', 'arnav311004', 'wmm72', and 'DoctorNick'. Observing their games on ICC allowed me to discover many potential improvements for Giraffe. This project would have been much less successful without the Imperial College High Performance Computing service, which provided the immense computing power required for this project. And my family and friends, for all the fish. 3 Contents 1 Introduction 8 1.1 Computers and Chess . .8 2 Background and Related Work9 2.1 Conventional Chess Engines . .9 2.1.1 Minimax . .9 2.1.2 α-β Pruning . 11 2.1.3 Evaluation . 12 2.2 Machine Learning in Zero-Sum Games . 13 2.2.1 Mate Solvers . 13 2.2.2 Learning Based on Hand-Selected Features . 14 2.2.3 Temporal-Difference Learning . 14 2.3 Deep Learning . 15 2.4 Project Overview and Contributions . 16 3 Supporting Work 16 4 Neural Network-Based Evaluation 17 4.1 Feature Representation . 17 4.2 Network Architecture . 21 4.3 Training Set Generation . 21 4.4 Network Initialization . 22 4.5 TD-Leaf . 23 4.6 Results and Evaluation . 24 5 Probabilistic Search 26 5.1 The Search Problem Redefined . 26 5.2 Conventional Depth-Limited Search . 26 5.3 Probability-Limited Search . 27 5.4 Depth-Limited Search vs Probability-Limited Search . 28 4 6 Neural Network-Based Probability Estimation 29 6.1 Feature Representation . 30 6.2 Network Architecture . 31 6.3 Training Positions Generation . 31 6.4 Network Training . 31 6.5 Results and Evaluation . 31 7 Conclusion 32 7.1 Future Work . 33 7.1.1 Neural Network-Based Time Management . 33 7.1.2 More Fine-Grained Caching, and Model Compression . 34 7.1.3 Evaluation Error Estimation . 34 7.1.4 Similarity Pruning . 34 Appendices 36 Appendix A The Elo Rating System 36 5 List of Tables 1 Example position representation . 20 2 TD-Leaf Example . 23 3 STS Test Results (0.1s/position) . 25 4 Move prediction results (all move types) . 32 5 Move prediction results (quiet moves) . 32 6 List of Figures 1 Example to illustrate effects of different feature representations . 18 2 Feature representation example . 20 3 Network architecture . 21 4 Training log . 25 5 Example of nodes searched by depth-limited search vs probability-limited search . 28 6 Example of search tree with very different branching factors in different branches . 29 7 1 Introduction 1.1 Computers and Chess Chess is a game that requires so much creativity and sophisticated reasoning that it was once thought of as something no computers will ever be able to do. It was frequently listed alongside activities like poetry writing and painting, as examples of tasks that can only be performed by humans. While writing poetry has remained very difficult for computers to this day, we have had much more success building chess-playing computers. In 1997, IBM's Deep Blue defeated the reigning World Chess Champion, Garry Kasparov, under standard tournament rules, for the first time in the history of chess. In the ensuing two decades, both computer hardware and AI research advanced the state-of-art chess-playing computers to the point where even the best humans today have no realistic chance of defeating a modern chess engine running on a smartphone. However, it is interesting to note that the way computers play chess is very different from how humans play. While both humans and computers search ahead to predict how the game will go on, humans are much more selective in which branches of the game tree to explore. Computers, on the other hand, rely on brute force to explore as many continuations as possible, even ones that will be immediately thrown out by any skilled human. In a sense, the way humans play chess is much more computationally efficient - using Garry Kasparov vs Deep Blue as an example, Kasparov could not have been searching more than 3-5 positions per second, while Deep Blue, a supercomputer with 480 custom "chess processors", searched about 200 million positions per second [1][2] to play at approximately equal strength (Deep Blue won the 6-game match with 2 wins, 3 draws, and 1 loss). How can a human searching 3-5 positions per second be as strong as a computer searching 200 million positions per second? And is it possible to build even stronger chess computers than what we have today, by making them more computationally efficient? Those are the questions this project investigates. There have been many attempts at making chess engines more selective without overlooking impor- tant continuations, but it has proven to be a very difficult problem. It is difficult to come up with reliable rules to decide which branches to explore further and which branches to prune. Humans do this by applying very complex rules with many exceptions. People learn those rules through playing and observing thousands of games, and no one has been able to convert that knowledge into concrete rules for computers that work in all situations. Instead of the conventional method of teaching computers how to play chess by giving them hard- coded rules, this project is an attempt to use machine learning to figure out how to play chess through self-play, and have them derive their own rules from the games. Using multiple deep artificial neural networks trained in a temporal-difference reinforcement learn- ing framework, we use machine learning to assist the engine in making decisions in a few places - • Statically evaluating positions - estimating how good a position is without looking further • Deciding which branches are most "interesting" in any given position, and should be searched further, as well as which branches to discard • Ordering moves - determining which moves to search before others, which significantly affects efficiency of searches 8 Using artificial neural networks as a substitute for "intuition", we hope to create a machine that can play chess more efficiently. 2 Background and Related Work Thanks to the complex yet well-defined nature of the game, chess is an extensively-studied prob- lem in artificial intelligence. Thousands of papers have been written on all aspects of computer chess since the 1950s, and at least hundreds of chess engines have been written by researchers, professionals, and amateurs over the same years [3]. The chess problem is as follows - given a chess position, we want to find a move, out of all legal moves for the position, that maximizes our chance of winning the game.