Accelerating Alphazero's Performance for Chess
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The 12Th Top Chess Engine Championship
TCEC12: the 12th Top Chess Engine Championship Article Accepted Version Haworth, G. and Hernandez, N. (2019) TCEC12: the 12th Top Chess Engine Championship. ICGA Journal, 41 (1). pp. 24-30. ISSN 1389-6911 doi: https://doi.org/10.3233/ICG-190090 Available at http://centaur.reading.ac.uk/76985/ It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing . To link to this article DOI: http://dx.doi.org/10.3233/ICG-190090 Publisher: The International Computer Games Association All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement . www.reading.ac.uk/centaur CentAUR Central Archive at the University of Reading Reading’s research outputs online TCEC12: the 12th Top Chess Engine Championship Guy Haworth and Nelson Hernandez1 Reading, UK and Maryland, USA After the successes of TCEC Season 11 (Haworth and Hernandez, 2018a; TCEC, 2018), the Top Chess Engine Championship moved straight on to Season 12, starting April 18th 2018 with the same divisional structure if somewhat evolved. Five divisions, each of eight engines, played two or more ‘DRR’ double round robin phases each, with promotions and relegations following. Classic tempi gradually lengthened and the Premier division’s top two engines played a 100-game match to determine the Grand Champion. The strategy for the selection of mandated openings was finessed from division to division. -
Game Changer
Matthew Sadler and Natasha Regan Game Changer AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI New In Chess 2019 Contents Explanation of symbols 6 Foreword by Garry Kasparov �������������������������������������������������������������������������������� 7 Introduction by Demis Hassabis 11 Preface 16 Introduction ������������������������������������������������������������������������������������������������������������ 19 Part I AlphaZero’s history . 23 Chapter 1 A quick tour of computer chess competition 24 Chapter 2 ZeroZeroZero ������������������������������������������������������������������������������ 33 Chapter 3 Demis Hassabis, DeepMind and AI 54 Part II Inside the box . 67 Chapter 4 How AlphaZero thinks 68 Chapter 5 AlphaZero’s style – meeting in the middle 87 Part III Themes in AlphaZero’s play . 131 Chapter 6 Introduction to our selected AlphaZero themes 132 Chapter 7 Piece mobility: outposts 137 Chapter 8 Piece mobility: activity 168 Chapter 9 Attacking the king: the march of the rook’s pawn 208 Chapter 10 Attacking the king: colour complexes 235 Chapter 11 Attacking the king: sacrifices for time, space and damage 276 Chapter 12 Attacking the king: opposite-side castling 299 Chapter 13 Attacking the king: defence 321 Part IV AlphaZero’s -
Monte-Carlo Tree Search As Regularized Policy Optimization
Monte-Carlo tree search as regularized policy optimization Jean-Bastien Grill * 1 Florent Altche´ * 1 Yunhao Tang * 1 2 Thomas Hubert 3 Michal Valko 1 Ioannis Antonoglou 3 Remi´ Munos 1 Abstract AlphaZero employs an alternative handcrafted heuristic to achieve super-human performance on board games (Silver The combination of Monte-Carlo tree search et al., 2016). Recent MCTS-based MuZero (Schrittwieser (MCTS) with deep reinforcement learning has et al., 2019) has also led to state-of-the-art results in the led to significant advances in artificial intelli- Atari benchmarks (Bellemare et al., 2013). gence. However, AlphaZero, the current state- of-the-art MCTS algorithm, still relies on hand- Our main contribution is connecting MCTS algorithms, crafted heuristics that are only partially under- in particular the highly-successful AlphaZero, with MPO, stood. In this paper, we show that AlphaZero’s a state-of-the-art model-free policy-optimization algo- search heuristics, along with other common ones rithm (Abdolmaleki et al., 2018). Specifically, we show that such as UCT, are an approximation to the solu- the empirical visit distribution of actions in AlphaZero’s tion of a specific regularized policy optimization search procedure approximates the solution of a regularized problem. With this insight, we propose a variant policy-optimization objective. With this insight, our second of AlphaZero which uses the exact solution to contribution a modified version of AlphaZero that comes this policy optimization problem, and show exper- significant performance gains over the original algorithm, imentally that it reliably outperforms the original especially in cases where AlphaZero has been observed to algorithm in multiple domains. -
The TCEC20 Computer Chess Superfinal: a Perspective
The TCEC20 Computer Chess Superfinal: a Perspective GM Matthew Sadler1 London, UK 1 THE TCEC20 PREMIER DIVISION Just like last year, Season 20 DivP gathered a host of familiar faces and once again, one of those faces had a facelift! The venerable KOMODO engine had been given an NNUE beauty boost and there was a feeling that KOMODO DRAGON might prove a threat to LEELA’s and STOCKFISH’s customary tickets to the Superfinal. There was also an updated STOOFVLEES to enjoy while ETHEREAL came storming back to DivP after a convincing victory in League 1. ROFCHADE just pipped NNUE-powered IGEL on tie- break for the second DivP spot. In the end, KOMODO turned out to have indeed received a significant boost from NNUE, pipping ALLIESTEIN for third place. However, both LEELA and STOCKFISH seemed to have made another quantum leap forwards, dominating the field and ending joint first on 38/56, 5.5 points clear of KOMODO DRAGON! Both LEELA and STOCKFISH scored an astonishing 24 wins, with LEELA winning 24 of its 28 White games and STOCKFISH 22! Both winners lost 4 games, all as Black, with LEELA notably tripping up against bottom-marker ROFCHADE and STOCKFISH being beautifully squeezed by STOOFVLEES. The high winning percentage this time is explained by the new high-bias opening book designed for the Premier Division. I had slightly mixed feelings about it as LEELA and STOCKFISH winning virtually all their White games felt like going a bit too much the other way: you want them to have to work a bit harder for their Superfinal places! However, it was once again a very entertaining tournament with plenty of good games. -
Distributional Differences Between Human and Computer Play at Chess
Multidisciplinary Workshop on Advances in Preference Handling: Papers from the AAAI-14 Workshop Human and Computer Preferences at Chess Kenneth W. Regan Tamal Biswas Jason Zhou Department of CSE Department of CSE The Nichols School University at Buffalo University at Buffalo Buffalo, NY 14216 USA Amherst, NY 14260 USA Amherst, NY 14260 USA [email protected] [email protected] Abstract In our case the third parties are computer chess programs Distributional analysis of large data-sets of chess games analyzing the position and the played move, and the error played by humans and those played by computers shows the is the difference in analyzed value from its preferred move following differences in preferences and performance: when the two differ. We have run the computer analysis (1) The average error per move scales uniformly higher the to sufficient depth estimated to have strength at least equal more advantage is enjoyed by either side, with the effect to the top human players in our samples, depth significantly much sharper for humans than computers; greater than used in previous studies. We have replicated our (2) For almost any degree of advantage or disadvantage, a main human data set of 726,120 positions from tournaments human player has a significant 2–3% lower scoring expecta- played in 2010–2012 on each of four different programs: tion if it is his/her turn to move, than when the opponent is to Komodo 6, Stockfish DD (or 5), Houdini 4, and Rybka 3. move; the effect is nearly absent for computers. The first three finished 1-2-3 in the most recent Thoresen (3) Humans prefer to drive games into positions with fewer Chess Engine Competition, while Rybka 3 (to version 4.1) reasonable options and earlier resolutions, even when playing was the top program from 2008 to 2011. -
Move Similarity Analysis in Chess Programs
Move similarity analysis in chess programs D. Dailey, A. Hair, M. Watkins Abstract In June 2011, the International Computer Games Association (ICGA) disqual- ified Vasik Rajlich and his Rybka chess program for plagiarism and breaking their rules on originality in their events from 2006-10. One primary basis for this came from a painstaking code comparison, using the source code of Fruit and the object code of Rybka, which found the selection of evaluation features in the programs to be almost the same, much more than expected by chance. In his brief defense, Rajlich indicated his opinion that move similarity testing was a superior method of detecting misappropriated entries. Later commentary by both Rajlich and his defenders reiterated the same, and indeed the ICGA Rules themselves specify move similarity as an example reason for why the tournament director would have warrant to request a source code examination. We report on data obtained from move-similarity testing. The principal dataset here consists of over 8000 positions and nearly 100 independent engines. We comment on such issues as: the robustness of the methods (upon modifying the experimental conditions), whether strong engines tend to play more similarly than weak ones, and the observed Fruit/Rybka move-similarity data. 1. History and background on derivative programs in computer chess Computer chess has seen a number of derivative programs over the years. One of the first was the incident in the 1989 World Microcomputer Chess Cham- pionship (WMCCC), in which Quickstep was disqualified due to the program being \a copy of the program Mephisto Almeria" in all important areas. -
Efficiently Mastering the Game of Nogo with Deep Reinforcement
electronics Article Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge Yifan Gao 1,*,† and Lezhou Wu 2,† 1 College of Medicine and Biological Information Engineering, Northeastern University, Liaoning 110819, China 2 College of Information Science and Engineering, Northeastern University, Liaoning 110819, China; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with Citation: Gao, Y.; Wu, L. Efficiently 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based Mastering the Game of NoGo with on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with Deep Reinforcement Learning limited resources, defeating many AlphaZero-based programs. -
ELF Opengo: an Analysis and Open Reimplementation of Alphazero
ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero Yuandong Tian 1 Jerry Ma * 1 Qucheng Gong * 1 Shubho Sengupta * 1 Zhuoyuan Chen 1 James Pinkerton 1 C. Lawrence Zitnick 1 Abstract However, these advances in playing ability come at signifi- The AlphaGo, AlphaGo Zero, and AlphaZero cant computational expense. A single training run requires series of algorithms are remarkable demonstra- millions of selfplay games and days of training on thousands tions of deep reinforcement learning’s capabili- of TPUs, which is an unattainable level of compute for the ties, achieving superhuman performance in the majority of the research community. When combined with complex game of Go with progressively increas- the unavailability of code and models, the result is that the ing autonomy. However, many obstacles remain approach is very difficult, if not impossible, to reproduce, in the understanding of and usability of these study, improve upon, and extend. promising approaches by the research commu- In this paper, we propose ELF OpenGo, an open-source nity. Toward elucidating unresolved mysteries reimplementation of the AlphaZero (Silver et al., 2018) and facilitating future research, we propose ELF algorithm for the game of Go. We then apply ELF OpenGo OpenGo, an open-source reimplementation of the toward the following three additional contributions. AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate First, we train a superhuman model for ELF OpenGo. Af- superhuman performance with a perfect (20:0) ter running our AlphaZero-style training software on 2,000 record against global top professionals. We ap- GPUs for 9 days, our 20-block model has achieved super- ply ELF OpenGo to conduct extensive ablation human performance that is arguably comparable to the 20- studies, and to identify and analyze numerous in- block models described in Silver et al.(2017) and Silver teresting phenomena in both the model training et al.(2018). -
Some Chess-Specific Improvements for Perturbation-Based Saliency Maps
Some Chess-Specific Improvements for Perturbation-Based Saliency Maps Jessica Fritz Johannes Furnkranz¨ Johannes Kepler University Johannes Kepler University Linz, Austria Linz, Austria [email protected] juffi@faw.jku.at Abstract—State-of-the-art game playing programs do not pro- vide an explanation for their move choice beyond a numerical score for the best and alternative moves. However, human players want to understand the reasons why a certain move is considered to be the best. Saliency maps are a useful general technique for this purpose, because they allow visualizing relevant aspects of a given input to the produced output. While such saliency maps are commonly used in the field of image classification, their adaptation to chess engines like Stockfish or Leela has not yet seen much work. This paper takes one such approach, Specific and Relevant Feature Attribution (SARFA), which has previously been proposed for a variety of game settings, and analyzes it specifically for the game of chess. In particular, we investigate differences with respect to its use with different chess Fig. 1: SARFA Example engines, to different types of positions (tactical vs. positional as well as middle-game vs. endgame), and point out some of the approach’s down-sides. Ultimately, we also suggest and evaluate a neural networks [19], there is also great demand for explain- few improvements of the basic algorithm that are able to address able AI models in other areas of AI, including game playing. some of the found shortcomings. An approach that has recently been proposed specifically for Index Terms—Explainable AI, Chess, Machine learning games and related agent architectures is specific and relevant feature attribution (SARFA) [15].1 It has been shown that it I. -
Alpha Zero Paper
RESEARCH COMPUTER SCIENCE programmers, combined with a high-performance alpha-beta search that expands a vast search tree by using a large number of clever heuristics and domain-specific adaptations. In (10) we describe A general reinforcement learning these augmentations, focusing on the 2016 Top Chess Engine Championship (TCEC) season 9 algorithm that masters chess, shogi, world champion Stockfish (11); other strong chess programs, including Deep Blue, use very similar and Go through self-play architectures (1, 12). In terms of game tree complexity, shogi is a substantially harder game than chess (13, 14): It David Silver1,2*†, Thomas Hubert1*, Julian Schrittwieser1*, Ioannis Antonoglou1, is played on a larger board with a wider variety of Matthew Lai1, Arthur Guez1, Marc Lanctot1, Laurent Sifre1, Dharshan Kumaran1, pieces; any captured opponent piece switches Thore Graepel1, Timothy Lillicrap1, Karen Simonyan1, Demis Hassabis1† sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such Thegameofchessisthelongest-studieddomainin the history of artificial intelligence. as the 2017 Computer Shogi Association (CSA) The strongest programs are based on a combination of sophisticated search techniques, world champion Elmo, have only recently de- domain-specific adaptations, and handcrafted evaluation functions that have been refined feated human champions (15). These programs by human experts over several decades. By contrast, the AlphaGo Zero program recently use an algorithm similar to those used by com- achieved superhuman performance in the game of Go by reinforcement learning from self-play. puter chess programs, again based on a highly In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve Downloaded from optimized alpha-beta search engine with many superhuman performance in many challenging games. -
Komodo Chess 10
Komodo Chess 10 CCOUNT A Komodo, as well as chess prowess, in computer chess. What is more, at E S A B Komodo Chess 10 S S E 64 BIT MULTIPROCESSORVERSION H tournament time controls, Komodo C offers all the training and playing M I U P R E M Sie halten genau den Fritz in den Händen, der seit Jahren die gesamte Schachwelt fasziniert. Ob beim Match 2002 in Bahrain gegen Weltmeister Vladimir Kramnik, 2003 in New York gegen 6 MONATE is #1 in the majority of rating lists. Garry Kasparov oder 2006 in der Bonner Bundeskunsthalle beim 4:2 Sieg gegen Weltmeister functions you know from Fritz, Lorem ipsum dolor sit, Vladimir Kramnik: Millionen von Schachfans verfolgten bei dem Duell „Mensch gegen Maschine“ consectetuer adipiscing elit. die Weltklassepartien von Fritz, dem „populärsten deutschen Schachprogramm“ (Der Spiegel). Aenean commodo ligula including direct access to the Chess- eget dolor. Aenean massa. Aber keine Angst: FRITZ kann auch anders. Er passt sich automatisch jeder Spielstärke an, hilft während der Partie mit Zug- und Stellungserklärungen, Coachfunktion, Gefahrenanzeige 10 und Eröffnungsstatistik. Für Vereinsspieler ist FRITZ schon lange ein unentbehrlicher Partner: PLAYCHESS automatische Partieanalyse, viele Trainingsmodule, Handicapfunktionen, eine Datenbank mit Base Web Apps such as Live Data- lorem ipsum dolor sit mehr als einer Million Partien und ungezählte Profifunktionen, die auch Garry Kasparov seit ameter meter Jahren nutzt: „Ich analysiere regelmäßig mit FRITZ“. Komodo Chess 10 comes with: LIVE DATABASE Haben Sie schon einmal mit einem Großmeister trainiert? Von Meistern lernen hießt meisterhaft base, the ChessBaseKomodo video portal, lorem ipsum dolor sit lernen. Fritz11 bietet 13 (!) Stunden Videolektionen für jede Spielstärke: Mit Großmeister Dr. -
Introduction to Deep Reinforcement Learning
Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Dr. Jenia Jitsev Head of Cross-Sectional Team Deep Learning (CST-DL) Scientific Lead Helmholtz AI Local (HAICU Local) Institute for Advanced Simulation (IAS) Juelich Supercomputing Center (JSC) @Jenia Jitsev LECTURE 9 Introduction to Deep Reinforcement Learning Feb 19th, 2020 JSC, Germany Machine Learning: Forms of Learning ● Supervised learning: correct responses Y for input data X are given; → Y “teacher” signal, correct “outcomes”, “labels” for the data X - usually : estimate unknown f: X→Y, y = f(x; W) - classical frameworks: classification, regression ● Unsupervised learning: only data X is given - find “hidden” structure, patterns in the data; - in general, estimate unknown probability density p(X) - in general : find a model that underlies / generates X - broad class of latent (“hidden”) variable models - classical frameworks: clustering, dimensionality reduction (e.g PCA) ● Reinforcement learning: data X, including (sparse) reward r(X) - discover actions a that maximize total future reward R t f a - active learning: experience X depends on choice of a h c s n i e - estimate p(a|X), p(r|X), V(X), Q(X,a) – future reward predictors m e G - z t l o h m l ● e H For all holds: r e d n i - Define a loss L(D,W), optimize by tuning free parameters W d e i l g t i M Deep Neural Networks: Forms of Learning ● Supervised learning: correct responses Y for input data X are given - find unknown f: y = f(x;W) or density p(Y|X;W) for data (x,y) - Deep CNN for visual object recognition (e.g, Inception, ResNet, ...) ● Unsupervised learning: only data X is given - general setting: estimate unknown density p(X;W) - find a model that underlies / generates X - broad class of latent (“hidden”) variable models - Variational Autoencoder (VAE), data generation and inference - Generative Adversarial Networks (GANs), data generation - Autoregressive models : PixelRNN, PixelCNN, ..