Balanced Asymmetry in General Game Playing
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
AAAI 2008 Workshop Reports
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 5-2009 AAAI 2008 Workshop Reports Mark Dredze University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_reports Recommended Citation Mark Dredze, "AAAI 2008 Workshop Reports", . May 2009. Sarabjot Singh Anand, Razvan Bunescu, Vitor Carvalho, Jan Chomicki, Vincent Conitzer, Michael T Cox, Virginia Dignum, Zachary Dodds, Mark Dredze, David Furcy, Evgeniy Gabrilovich, Mehmet H Göker, Hans Guesgen, Haym Hirsh, Dietmar Jannach, Ulrich Junker. (2009, April). AAAI 2008 Workshop Reports. AI Magazine, 30(1), 108-118. Copyright AAAI 2009. The copies do not imply AAAI endorsement of a product or a service of the employer, and that the copies are not for sale. Publisher URL: http://proquest.umi.com/ pqdlink?did=1680350011&sid=1&Fmt=2&clientId=3748&RQT=309&VName=PQD This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/904 For more information, please contact [email protected]. AAAI 2008 Workshop Reports Abstract AAAI was pleased to present the AAAI-08 Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA. The program included the following 15 workshops: Advancements in POMDP Solvers; AI Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of Human-Robot Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI Research and Applications; and Wikipedia and Artificial Intelligence: An vE olving Synergy. -
Game Changer
Matthew Sadler and Natasha Regan Game Changer AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI New In Chess 2019 Contents Explanation of symbols 6 Foreword by Garry Kasparov �������������������������������������������������������������������������������� 7 Introduction by Demis Hassabis 11 Preface 16 Introduction ������������������������������������������������������������������������������������������������������������ 19 Part I AlphaZero’s history . 23 Chapter 1 A quick tour of computer chess competition 24 Chapter 2 ZeroZeroZero ������������������������������������������������������������������������������ 33 Chapter 3 Demis Hassabis, DeepMind and AI 54 Part II Inside the box . 67 Chapter 4 How AlphaZero thinks 68 Chapter 5 AlphaZero’s style – meeting in the middle 87 Part III Themes in AlphaZero’s play . 131 Chapter 6 Introduction to our selected AlphaZero themes 132 Chapter 7 Piece mobility: outposts 137 Chapter 8 Piece mobility: activity 168 Chapter 9 Attacking the king: the march of the rook’s pawn 208 Chapter 10 Attacking the king: colour complexes 235 Chapter 11 Attacking the king: sacrifices for time, space and damage 276 Chapter 12 Attacking the king: opposite-side castling 299 Chapter 13 Attacking the king: defence 321 Part IV AlphaZero’s -
Improving Model-Based Reinforcement Learning with Internal State Representations Through Self-Supervision
1 Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez and Stefan Wermter Knowledge Technology, Department of Informatics, University of Hamburg, Germany [email protected], fweber, hafez, [email protected] Abstract—Using a model of the environment, reinforcement the design decisions made for MuZero, namely, how uncon- learning agents can plan their future moves and achieve super- strained its learning process is. After explaining all necessary human performance in board games like Chess, Shogi, and Go, fundamentals, we propose two individual augmentations to while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be the algorithm that work fully unsupervised in an attempt to learned dynamically, generalizing the agent to many more tasks improve its performance and make it more reliable. After- while at the same time achieving state-of-the-art performance. ward, we evaluate our proposals by benchmarking the regular Notably, MuZero uses internal state representations derived from MuZero algorithm against the newly augmented creation. real environment states for its predictions. In this paper, we bind the model’s predicted internal state representation to the environ- II. BACKGROUND ment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently A. MuZero Algorithm and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of The MuZero algorithm [1] is a model-based reinforcement reconstruction model loss and simpler consistency loss provide a learning algorithm that builds upon the success of its prede- significant performance increase in OpenAI Gym environments. -
The Founder of the Creative Destruction Lab Describes Its Moonshot Mission to Create a Canadian AI Ecosystem
The founder of the Creative Destruction Lab describes its moonshot mission to create a Canadian AI ecosystem. Thought Leader Interview: by Karen Christensen Describe what happens at the Creative Destruction Lab by companies that went through the Lab. When we finished our (CDL). fifth year in June 2017, we had exceeded $1.4 billion in equity The CDL is a seed-stage program for massively scalable science- value created. based companies. Some start-ups come from the University of Toronto community, but we now also receive applications from What exactly does the Lab provide to entrepreneurs? Europe, the U.S. (including Silicon Valley), Israel and Asia. Start-up founders benefit from a structured, objectives-oriented We launched the program in September 2012, and each process that increases their probability of success. The process is autumn since, we’ve admitted a new cohort of start-ups into orchestrated by the CDL team, while CDL Fellows and Associ- the program. Most companies that we admit have developed a ates generate the objectives. Objective-setting is a cornerstone working prototype or proof of concept. The most common type of the process. Every eight weeks the Fellows and Associates set of founder is a recently graduated PhD in Engineering or Com- three objectives for the start-ups to achieve, at the exclusion of puter Science who has spent several years working on a problem everything else. In other words, they define clear goals for an and has invented something at the frontier of their field. eight week ‘sprint’. Objectives can be business, technnology or The program does not guarantee financing, but the majority HR-oriented. -
Survey on Reinforcement Learning for Language Processing
Survey on reinforcement learning for language processing V´ıctorUc-Cetina1, Nicol´asNavarro-Guerrero2, Anabel Martin-Gonzalez1, Cornelius Weber3, Stefan Wermter3 1 Universidad Aut´onomade Yucat´an- fuccetina, [email protected] 2 Aarhus University - [email protected] 3 Universit¨atHamburg - fweber, [email protected] February 2021 Abstract In recent years some researchers have explored the use of reinforcement learning (RL) algorithms as key components in the solution of various natural language process- ing tasks. For instance, some of these algorithms leveraging deep neural learning have found their way into conversational systems. This paper reviews the state of the art of RL methods for their possible use for different problems of natural language processing, focusing primarily on conversational systems, mainly due to their growing relevance. We provide detailed descriptions of the problems as well as discussions of why RL is well-suited to solve them. Also, we analyze the advantages and limitations of these methods. Finally, we elaborate on promising research directions in natural language processing that might benefit from reinforcement learning. Keywords| reinforcement learning, natural language processing, conversational systems, pars- ing, translation, text generation 1 Introduction Machine learning algorithms have been very successful to solve problems in the natural language pro- arXiv:2104.05565v1 [cs.CL] 12 Apr 2021 cessing (NLP) domain for many years, especially supervised and unsupervised methods. However, this is not the case with reinforcement learning (RL), which is somewhat surprising since in other domains, reinforcement learning methods have experienced an increased level of success with some impressive results, for instance in board games such as AlphaGo Zero [106]. -
A Strategic Metagame Player for General Chesslike Games
From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. A Strategic Metagame Player for General Chess-Like Games Barney Pell* Computer Laboratory University of Cambridge Cambridge, CB2 3QG, UK b dp((~cl c am. ac Abstract game will transfer also to new games. Such criteria in- clude the use of learning and planning, and the ability This paper reviews the concept of Metagameand to play more than one game. discusses the implelnentation of METAGAMER,a program which plays Metagame in the class of 1.1 Bias when using Existing Games symmetric chess-like games, which includes chess, However,even this generality-oriented research is sub- Chinese-chess, checkers, draughts, and Shogi. The ject to a potential methodological bias. As the human l)rogram takes as input the rules of a specific game researchers know at the time of program-development and analyses those rules to construct for that game which specific game or games the program will be tested an efficient representation and an evaluation func- on, it is possible that they import the results of their tion, both for use with a generic search engine. The own understanding of the game dh’ectly into their pro- strategic analysis performed by the program relates gn’am. In this case, it becomes difficult to determine a set of general knowledgesources to the details of whether the subsequent performance of the program the particular game. Amongother properties, this is due to the general theory it implements, or merely analysis determines the relative value of the differ- to the insightful observations of its developer about the eat pieces in a given game. -
Monte-Carlo Tree Search As Regularized Policy Optimization
Monte-Carlo tree search as regularized policy optimization Jean-Bastien Grill * 1 Florent Altche´ * 1 Yunhao Tang * 1 2 Thomas Hubert 3 Michal Valko 1 Ioannis Antonoglou 3 Remi´ Munos 1 Abstract AlphaZero employs an alternative handcrafted heuristic to achieve super-human performance on board games (Silver The combination of Monte-Carlo tree search et al., 2016). Recent MCTS-based MuZero (Schrittwieser (MCTS) with deep reinforcement learning has et al., 2019) has also led to state-of-the-art results in the led to significant advances in artificial intelli- Atari benchmarks (Bellemare et al., 2013). gence. However, AlphaZero, the current state- of-the-art MCTS algorithm, still relies on hand- Our main contribution is connecting MCTS algorithms, crafted heuristics that are only partially under- in particular the highly-successful AlphaZero, with MPO, stood. In this paper, we show that AlphaZero’s a state-of-the-art model-free policy-optimization algo- search heuristics, along with other common ones rithm (Abdolmaleki et al., 2018). Specifically, we show that such as UCT, are an approximation to the solu- the empirical visit distribution of actions in AlphaZero’s tion of a specific regularized policy optimization search procedure approximates the solution of a regularized problem. With this insight, we propose a variant policy-optimization objective. With this insight, our second of AlphaZero which uses the exact solution to contribution a modified version of AlphaZero that comes this policy optimization problem, and show exper- significant performance gains over the original algorithm, imentally that it reliably outperforms the original especially in cases where AlphaZero has been observed to algorithm in multiple domains. -
Lukas Biewald
Lukas Biewald 823 Capp St., San Francisco, CA 94110 (650) 804 5470 lukas@crowdflower.com Awards Netexplorateur of the Year http://www.netexplorateur.org/ (2010) TechCrunch 50 http://techcrunch50.com (2009) First Place, Caltech Turing Tournament http://turing.ssel.caltech.edu Work Founder, CEO: CrowdFlower 2008 — Present • Built leading Labor-on-Demand company from the ground up. Designed and coded core technology. Built customer base up from nothing to 100 customers including several Fortune-500 companies. Hired 25 employees. • Raised two rounds of financing totalling $6.2 million in 2009 in tough economic climate from well-known angels and VCs including Trinity, Bessemer, Founders´ Fund, Auren Hoffman, Aydin Senkut, Manu Kumar, Gary Kremen and Travis Kalanick. Senior Scientist: Powerset 2007 — 2008 • As the 20th employee, created and led two core teams: metrics and search ranking. • Developed the Amazon Mechanical Turk interface, identified as a key technology in Microsoft’s $100 million acquisition. Relevance Engineer, Engineering Manager: Yahoo! Inc. 2005 — 2007 • Technical lead for research, development and deployment of new web search ranking functions in five major markets. • Chosen as most productive employee in Applied Science business unit year end review. Research Assistant: Stanford AI Lab (sail.stanford.edu) 2003 — 2004 • Researched word-sense disambiguation, machine translation and Japanese word segmentation. • Published two papers with over 100 citations. Education Stanford University 1998 — 2004 BS in Mathematical and Computational Science with Honors (2003) MS in Computer Science with Concentration in Artificial Intelligence (2004) Publications David Vickrey, Lukas Biewald, Marc Teyssier, and Daphne Koller Word-Sense Disambiguation for Machine Translation. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2005. -
Multi-Agent Reinforcement Learning: a Review of Challenges and Applications
applied sciences Review Multi-Agent Reinforcement Learning: A Review of Challenges and Applications Lorenzo Canese †, Gian Carlo Cardarilli, Luca Di Nunzio †, Rocco Fazzolari † , Daniele Giardino †, Marco Re † and Sergio Spanò *,† Department of Electronic Engineering, University of Rome “Tor Vergata”, Via del Politecnico 1, 00133 Rome, Italy; [email protected] (L.C.); [email protected] (G.C.C.); [email protected] (L.D.N.); [email protected] (R.F.); [email protected] (D.G.); [email protected] (M.R.) * Correspondence: [email protected]; Tel.: +39-06-7259-7273 † These authors contributed equally to this work. Abstract: In this review, we present an analysis of the most used multi-agent reinforcement learn- ing algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important char- acteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the Citation: Canese, L.; Cardarilli, G.C.; performances of the considered methods. Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-Agent Keywords: machine learning; reinforcement learning; multi-agent; swarm Reinforcement Learning: A Review of Challenges and Applications. -
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
Combining Off and On-Policy Training in Model-Based Reinforcement Learning Alexandre Borges Arlindo L. Oliveira INESC-ID / Instituto Superior Técnico INESC-ID / Instituto Superior Técnico University of Lisbon University of Lisbon Lisbon, Portugal Lisbon, Portugal ABSTRACT factor and game length. AlphaGo [1] was the first computer pro- The combination of deep learning and Monte Carlo Tree Search gram to beat a professional human player in the game of Go by (MCTS) has shown to be effective in various domains, such as combining deep neural networks with tree search. In the first train- board and video games. AlphaGo [1] represented a significant step ing phase, the system learned from expert games, but a second forward in our ability to learn complex board games, and it was training phase enabled the system to improve its performance by rapidly followed by significant advances, such as AlphaGo Zero self-play using reinforcement learning. AlphaGoZero [2] learned [2] and AlphaZero [3]. Recently, MuZero [4] demonstrated that only through self-play and was able to beat AlphaGo soundly. Alp- it is possible to master both Atari games and board games by di- haZero [3] improved on AlphaGoZero by generalizing the model rectly learning a model of the environment, which is then used to any board game. However, these methods were designed only with Monte Carlo Tree Search (MCTS) [5] to decide what move for board games, specifically for two-player zero-sum games. to play in each position. During tree search, the algorithm simu- Recently, MuZero [4] improved on AlphaZero by generalizing lates games by exploring several possible moves and then picks the it even more, so that it can learn to play singe-agent games while, action that corresponds to the most promising trajectory. -
Autonomous Horizons: the Way Forward Is a Product of the Office Air University Press 600 Chennault Circle, Bldg 1405 of the US Air Force Chief Scientist (AF/ST)
Autonomous Horizons The Way Forward A vision for Air Force senior leaders of the potential for autonomous systems, and a general framework for the science and technology community to advance the state of the art Dr. Greg L. Zacharias Chief Scientist of the United States Air Force 2015–2018 The second volume in a series introduced by: Autonomous Horizons: Autonomy in the Air Force – A Path to the Future, Volume 1: Human Autonomy Teaming (AF/ST TR 15-01) March 2019 Air University Press Curtis E. LeMay Center for Doctrine Development and Education Maxwell AFB, Alabama Chief of Staff, US Air Force Library of Congress Cataloging-in-Publication Data Gen David L. Goldfein Names: Zacharias, Greg, author. | Air University (U.S.). Press, publisher. Commander, Air Education and Training | United States. Department of Defense. United States Air Force. Command Title: Autonomous horizons : the way forward / by Dr. Greg L. Zacha- Lt Gen Steven L. Kwast rias. Description: First edition. | Maxwell Air Force Base, AL : AU Press, 2019. “Chief Scientist for the United States Air Force.” | Commander and President, Air University Lt Gen Anthony J. Cotton “January 2019.” |Includes bibliographical references. Identifiers: LCCN 2018061682 | ISBN 9781585662876 Commander, Curtis E. LeMay Center for Subjects: LCSH: Aeronautics, Military—Research—United States. | Doctrine Development and Education United States. Air Force—Automation. | Artificial intelligence— Maj Gen Michael D. Rothstein Military applications—United States. | Intelligent control systems. | Autonomic -
Efficiently Mastering the Game of Nogo with Deep Reinforcement
electronics Article Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge Yifan Gao 1,*,† and Lezhou Wu 2,† 1 College of Medicine and Biological Information Engineering, Northeastern University, Liaoning 110819, China 2 College of Information Science and Engineering, Northeastern University, Liaoning 110819, China; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with Citation: Gao, Y.; Wu, L. Efficiently 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based Mastering the Game of NoGo with on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with Deep Reinforcement Learning limited resources, defeating many AlphaZero-based programs.