Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Arxiv:1712.01815V1 [Cs.AI] 5 Dec 2017
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Elements of DSAI: Game Tree Search, Learning Architectures
Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Elements of DSAI AlphaGo Part 2: Game Tree Search, Learning Architectures Search & Learn: A Recipe for AI Action Decisions J¨orgHoffmann Winter Term 2019/20 Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 1/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents? Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Single agent! Several koalas, several gorillas trying to beat these up. BUT there is only a single acting entity { one player decides which moves to take (who gets into the boat). Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 3/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents! Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Multi-agent competitive! TWO players deciding which moves to take. Conflicting interests. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 4/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Agenda: Game Search, AlphaGo Architecture Games: What is that? ! Game categories, game solutions. Game Search: How to solve a game? ! Searching the game tree. Evaluation Functions: How to evaluate a game position? ! Heuristic functions for games. AlphaGo: How does it work? ! Overview of AlphaGo architecture, and changes in Alpha(Go) Zero. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 5/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Positioning in the DSAI Phase Model Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 6/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Which Games? ! No chance element. -
Development of Games for Users with Visual Impairment Czech Technical University in Prague Faculty of Electrical Engineering
Development of games for users with visual impairment Czech Technical University in Prague Faculty of Electrical Engineering Dina Chernova January 2017 Acknowledgement I would first like to thank Bc. Honza Had´aˇcekfor his valuable advice. I am also very grateful to my supervisor Ing. Daniel Nov´ak,Ph.D. and to all participants that were involved in testing of my application for their precious time. I must express my profound gratitude to my loved ones for their support and continuous encouragement throughout my years of study. This accomplishment would not have been possible without them. Thank you. 5 Declaration I declare that I have developed this thesis on my own and that I have stated all the information sources in accordance with the methodological guideline of adhering to ethical principles during the preparation of university theses. In Prague 09.01.2017 Author 6 Abstract This bachelor thesis deals with analysis and implementation of mobile application that allows visually impaired people to play chess on their smart phones. The application con- trol is performed using special gestures and text{to{speech engine as a sound accompanier. For human against computer game mode I have used currently the best game engine called Stockfish. The application is developed under Android mobile platform. Keywords: chess; visually impaired; Android; Bakal´aˇrsk´apr´acese zab´yv´aanal´yzoua implementac´ımobiln´ıaplikace, kter´aumoˇzˇnuje zrakovˇepostiˇzen´ymlidem hr´atˇsachy na sv´emsmartphonu. Ovl´ad´an´ıaplikace se prov´ad´ı pomoc´ıspeci´aln´ıch gest a text{to{speech enginu pro zvukov´edoprov´azen´ı.V reˇzimu ˇclovˇek versus poˇc´ıtaˇcjsem pouˇzilasouˇcasnˇenejlepˇs´ıhern´ıengine Stockfish. -
Game Changer
Matthew Sadler and Natasha Regan Game Changer AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI New In Chess 2019 Contents Explanation of symbols 6 Foreword by Garry Kasparov �������������������������������������������������������������������������������� 7 Introduction by Demis Hassabis 11 Preface 16 Introduction ������������������������������������������������������������������������������������������������������������ 19 Part I AlphaZero’s history . 23 Chapter 1 A quick tour of computer chess competition 24 Chapter 2 ZeroZeroZero ������������������������������������������������������������������������������ 33 Chapter 3 Demis Hassabis, DeepMind and AI 54 Part II Inside the box . 67 Chapter 4 How AlphaZero thinks 68 Chapter 5 AlphaZero’s style – meeting in the middle 87 Part III Themes in AlphaZero’s play . 131 Chapter 6 Introduction to our selected AlphaZero themes 132 Chapter 7 Piece mobility: outposts 137 Chapter 8 Piece mobility: activity 168 Chapter 9 Attacking the king: the march of the rook’s pawn 208 Chapter 10 Attacking the king: colour complexes 235 Chapter 11 Attacking the king: sacrifices for time, space and damage 276 Chapter 12 Attacking the king: opposite-side castling 299 Chapter 13 Attacking the king: defence 321 Part IV AlphaZero’s -
White Knight Review Chess E-Magazine January/February - 2012 Table of Contents
Chess E-Magazine Interactive E-Magazine Volume 3 • Issue 1 January/February 2012 Chess Gambits Chess Gambits The Immortal Game Canada and Chess Anderssen- Vs. -Kieseritzky Bill Wall’s Top 10 Chess software programs C Seraphim Press White Knight Review Chess E-Magazine January/February - 2012 Table of Contents Editorial~ “My Move” 4 contents Feature~ Chess and Canada 5 Article~ Bill Wall’s Top 10 Software Programs 9 INTERACTIVE CONTENT ________________ Feature~ The Incomparable Kasparov 10 • Click on title in Table of Contents Article~ Chess Variants 17 to move directly to Unorthodox Chess Variations page. • Click on “White Feature~ Proof Games 21 Knight Review” on the top of each page to return to ARTICLE~ The Immortal Game 22 Table of Contents. Anderssen Vrs. Kieseritzky • Click on red type to continue to next page ARTICLE~ News Around the World 24 • Click on ads to go to their websites BOOK REVIEW~ Kasparov on Kasparov Pt. 1 25 • Click on email to Pt.One, 1973-1985 open up email program Feature~ Chess Gambits 26 • Click up URLs to go to websites. ANNOTATED GAME~ Bareev Vs. Kasparov 30 COMMENTARY~ “Ask Bill” 31 White Knight Review January/February 2012 White Knight Review January/February 2012 Feature My Move Editorial - Jerry Wall [email protected] Well it has been over a year now since we started this publication. It is not easy putting together a 32 page magazine on chess White Knight every couple of months but it certainly has been rewarding (maybe not so Review much financially but then that really never was Chess E-Magazine the goal). -
Reinforcement Learning (1) Key Concepts & Algorithms
Winter 2020 CSC 594 Topics in AI: Advanced Deep Learning 5. Deep Reinforcement Learning (1) Key Concepts & Algorithms (Most content adapted from OpenAI ‘Spinning Up’) 1 Noriko Tomuro Reinforcement Learning (RL) • Reinforcement Learning (RL) is a type of Machine Learning where an agent learns to achieve a goal by interacting with the environment -- trial and error. • RL is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. • The purpose of RL is to learn an optimal policy that maximizes the return for the sequences of agent’s actions (i.e., optimal policy). https://en.wikipedia.org/wiki/Reinforcement_learning 2 • RL have recently enjoyed a wide variety of successes, e.g. – Robot controlling in simulation as well as in the real world Strategy games such as • AlphaGO (by Google DeepMind) • Atari games https://en.wikipedia.org/wiki/Reinforcement_learning 3 Deep Reinforcement Learning (DRL) • A policy is essentially a function, which maps the agent’s each action to the expected return or reward. Deep Reinforcement Learning (DRL) uses deep neural networks for the function (and other components). https://spinningup.openai.com/en/latest/spinningup/rl_intro.html 4 5 Some Key Concepts and Terminology 1. States and Observations – A state s is a complete description of the state of the world. For now, we can think of states belonging in the environment. – An observation o is a partial description of a state, which may omit information. – A state could be fully or partially observable to the agent. If partial, the agent forms an internal state (or state estimation). https://spinningup.openai.com/en/latest/spinningup/rl_intro.html 6 • States and Observations (cont.) – In deep RL, we almost always represent states and observations by a real-valued vector, matrix, or higher-order tensor. -
Discrete Sequential Prediction of Continu
Under review as a conference paper at ICLR 2018 DISCRETE SEQUENTIAL PREDICTION OF CONTINU- OUS ACTIONS FOR DEEP RL Anonymous authors Paper under double-blind review ABSTRACT It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that com- plex functions over high dimensional spaces can be modeled by neural networks that predict one dimension at a time. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action spaces (approximately). On a simple example task we demonstrate empirically that our method can perform global search, which effectively gets around the local optimization issues that plague DDPG. We apply the technique to off-policy (Q-learning) methods and show that our method can achieve the state-of-the-art for off-policy methods on several continuous control tasks. 1 INTRODUCTION Reinforcement learning has long been considered as a general framework applicable to a broad range of problems. However, the approaches used to tackle discrete and continuous action spaces have been fundamentally different. In discrete domains, algorithms such as Q-learning leverage backups through Bellman equations and dynamic programming to solve problems effectively. -
ELF: an Extensive, Lightweight and Flexible Research Platform for Real-Time Strategy Games
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games Yuandong Tian1 Qucheng Gong1 Wenling Shang2 Yuxin Wu1 C. Lawrence Zitnick1 1Facebook AI Research 2Oculus 1fyuandong, qucheng, yuxinwu, [email protected] [email protected] Abstract In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environ- ments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a minia- ture version of StarCraft, captures key game dynamics and runs at 40K frame- per-second (FPS) per core on a laptop. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end- to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environ- ments like ALE [4]. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU [17] and Batch Normalization [11] cou- pled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than 70% of the time in the full game of Mini-RTS. Strong per- formance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open sourced at https://github.com/facebookresearch/ELF. 1 Introduction Game environments are commonly used for research in Reinforcement Learning (RL), i.e. -
OLIVAW: Mastering Othello with Neither Humans Nor a Penny Antonio Norelli, Alessandro Panconesi Dept
1 OLIVAW: Mastering Othello with neither Humans nor a Penny Antonio Norelli, Alessandro Panconesi Dept. of Computer Science, Universita` La Sapienza, Rome, Italy Abstract—We introduce OLIVAW, an AI Othello player adopt- of companies. Another aspect of the same problem is the ing the design principles of the famous AlphaGo series. The main amount of training needed. AlphaGo Zero required 4.9 million motivation behind OLIVAW was to attain exceptional competence games played during self-play. While to attain the level of in a non-trivial board game at a tiny fraction of the cost of its illustrious predecessors. In this paper, we show how the grandmaster for games like Starcraft II and Dota 2 the training AlphaGo Zero’s paradigm can be successfully applied to the required 200 years and more than 10,000 years of gameplay, popular game of Othello using only commodity hardware and respectively [7], [8]. free cloud services. While being simpler than Chess or Go, Thus one of the major problems to emerge in the wake Othello maintains a considerable search space and difficulty of these breakthroughs is whether comparable results can be in evaluating board positions. To achieve this result, OLIVAW implements some improvements inspired by recent works to attained at a much lower cost– computational and financial– accelerate the standard AlphaGo Zero learning process. The and with just commodity hardware. In this paper we take main modification implies doubling the positions collected per a small step in this direction, by showing that AlphaGo game during the training phase, by including also positions not Zero’s successful paradigm can be replicated for the game played but largely explored by the agent. -
(2021), 2814-2819 Research Article Can Chess Ever Be Solved Na
Turkish Journal of Computer and Mathematics Education Vol.12 No.2 (2021), 2814-2819 Research Article Can Chess Ever Be Solved Naveen Kumar1, Bhayaa Sharma2 1,2Department of Mathematics, University Institute of Sciences, Chandigarh University, Gharuan, Mohali, Punjab-140413, India [email protected], [email protected] Article History: Received: 11 January 2021; Accepted: 27 February 2021; Published online: 5 April 2021 Abstract: Data Science and Artificial Intelligence have been all over the world lately,in almost every possible field be it finance,education,entertainment,healthcare,astronomy, astrology, and many more sports is no exception. With so much data, statistics, and analysis available in this particular field, when everything is being recorded it has become easier for team selectors, broadcasters, audience, sponsors, most importantly for players themselves to prepare against various opponents. Even the analysis has improved over the period of time with the evolvement of AI, not only analysis one can even predict the things with the insights available. This is not even restricted to this,nowadays players are trained in such a manner that they are capable of taking the most feasible and rational decisions in any given situation. Chess is one of those sports that depend on calculations, algorithms, analysis, decisions etc. Being said that whenever the analysis is involved, we have always improvised on the techniques. Algorithms are somethingwhich can be solved with the help of various software, does that imply that chess can be fully solved,in simple words does that mean that if both the players play the best moves respectively then the game must end in a draw or does that mean that white wins having the first move advantage. -
Chess Openings
Chess Openings PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 10 Jun 2014 09:50:30 UTC Contents Articles Overview 1 Chess opening 1 e4 Openings 25 King's Pawn Game 25 Open Game 29 Semi-Open Game 32 e4 Openings – King's Knight Openings 36 King's Knight Opening 36 Ruy Lopez 38 Ruy Lopez, Exchange Variation 57 Italian Game 60 Hungarian Defense 63 Two Knights Defense 65 Fried Liver Attack 71 Giuoco Piano 73 Evans Gambit 78 Italian Gambit 82 Irish Gambit 83 Jerome Gambit 85 Blackburne Shilling Gambit 88 Scotch Game 90 Ponziani Opening 96 Inverted Hungarian Opening 102 Konstantinopolsky Opening 104 Three Knights Opening 105 Four Knights Game 107 Halloween Gambit 111 Philidor Defence 115 Elephant Gambit 119 Damiano Defence 122 Greco Defence 125 Gunderam Defense 127 Latvian Gambit 129 Rousseau Gambit 133 Petrov's Defence 136 e4 Openings – Sicilian Defence 140 Sicilian Defence 140 Sicilian Defence, Alapin Variation 159 Sicilian Defence, Dragon Variation 163 Sicilian Defence, Accelerated Dragon 169 Sicilian, Dragon, Yugoslav attack, 9.Bc4 172 Sicilian Defence, Najdorf Variation 175 Sicilian Defence, Scheveningen Variation 181 Chekhover Sicilian 185 Wing Gambit 187 Smith-Morra Gambit 189 e4 Openings – Other variations 192 Bishop's Opening 192 Portuguese Opening 198 King's Gambit 200 Fischer Defense 206 Falkbeer Countergambit 208 Rice Gambit 210 Center Game 212 Danish Gambit 214 Lopez Opening 218 Napoleon Opening 219 Parham Attack 221 Vienna Game 224 Frankenstein-Dracula Variation 228 Alapin's Opening 231 French Defence 232 Caro-Kann Defence 245 Pirc Defence 256 Pirc Defence, Austrian Attack 261 Balogh Defense 263 Scandinavian Defense 265 Nimzowitsch Defence 269 Alekhine's Defence 271 Modern Defense 279 Monkey's Bum 282 Owen's Defence 285 St. -
Improved Policy Networks for Computer Go
Improved Policy Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, PARIS, FRANCE Abstract. Golois uses residual policy networks to play Go. Two improvements to these residual policy networks are proposed and tested. The first one is to use three output planes. The second one is to add Spatial Batch Normalization. 1 Introduction Deep Learning for the game of Go with convolutional neural networks has been ad- dressed by [2]. It has been further improved using larger networks [7, 10]. AlphaGo [9] combines Monte Carlo Tree Search with a policy and a value network. Residual Networks improve the training of very deep networks [4]. These networks can gain accuracy from considerably increased depth. On the ImageNet dataset a 152 layers networks achieves 3.57% error. It won the 1st place on the ILSVRC 2015 classifi- cation task. The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks. Residual networks were recently successfully adapted to computer Go [1]. As a follow up to this paper, we propose improvements to residual networks for computer Go. The second section details different proposed improvements to policy networks for computer Go, the third section gives experimental results, and the last section con- cludes. 2 Proposed Improvements We propose two improvements for policy networks. The first improvement is to use multiple output planes as in DarkForest. The second improvement is to use Spatial Batch Normalization. 2.1 Multiple Output Planes In DarkForest [10] training with multiple output planes containing the next three moves to play has been shown to improve the level of play of a usual policy network with 13 layers. -
Grid Computing for Artificial Intelligence
0 10 Grid Computing for Artificial Intelligence Yuya Dan Matsuyama University Japan 1. Introduction This chapter is concerned in grid computing for artificial intelligence systems. In general, grid computing enables us to process a huge amount of data in any fields of discipline. Scientific computing, calculation of the accurate value of π, search for Mersenne prime numbers, analysis of protein molecular structure, weather dynamics, data analysis, business data mining, simulation are examples of application for grid computing. As is well known that supercomputers have very high performance in calculation, however, it is quite expensive to use them for a long time. On the other hand, grid computing using idle resources on the Internet may be running on the reasonable cost. Shogi is a traditional game involving two players in Japan as similar as chess, which is more complicated than chess in rule of play. Figure 1 shows the set of Shogi and initial state of 40 pieces on 9 x 9 board. According to game theory, both chess and Shogi are two-person zero-sum game with perfect information. They are not only a popular game but also a target in the research of artificial intelligence. Fig. 1. Picture of Shogi set including 40 pieces ona9x9board. There are the corresponding king, rook, knight, pawn, and other pieces. Six kinds of pieces can change their role like pawns in chess when they reach the opposite territory. 202 Advances in Grid Computing The information systems for Shogi need to process astronomical combinations of possible positions to move. It is described by Dan (4) that the grid systems for Shogi is effective in reduction of computational complexity with collaboration from computational resources on the Internet.