(CMPUT) 455 Search, Knowledge, and Simulations

Total Page:16

File Type:pdf, Size:1020Kb

(CMPUT) 455 Search, Knowledge, and Simulations Computing Science (CMPUT) 455 Search, Knowledge, and Simulations James Wright Department of Computing Science University of Alberta [email protected] Winter 2021 1 455 Today - Lecture 22 • AlphaGo - overview and early versions • Coursework • Work on Assignment 4 • Reading: AlphaGo Zero paper 2 AlphaGo Introduction • High-level overview • History of DeepMind and AlphaGo • AlphaGo components and versions • Performance measurements • Games against humans • Impact, limitations, other applications, future 3 About DeepMind • Founded 2010 as a startup company • Bought by Google in 2014 • Based in London, UK, Edmonton (from 2017), Montreal, Paris • Expertise in Reinforcement Learning, deep learning and search 4 DeepMind and AlphaGo • A DeepMind team developed AlphaGo 2014-17 • Result: Massive advance in playing strength of Go programs • Before AlphaGo: programs about 3 levels below best humans • AlphaGo/Alpha Zero: far surpassed human skill in Go • Now: AlphaGo is retired • Now: Many other super-strong programs, including open source Image source: • https://www.nature.com All are based on AlphaGo, Alpha Zero ideas 5 DeepMind and UAlberta • UAlberta has deep connections • Faculty who work part-time or on leave at DeepMind • Rich Sutton, Michael Bowling, Patrick Pilarski, Csaba Szepesvari (all part time) • Many of our former students and postdocs work at DeepMind • David Silver - UofA PhD, designer of AlphaGo, lead of the DeepMind RL and AlphaGo teams • Aja Huang - UofA postdoc, main AlphaGo programmer • Many from the computer Poker group • Some are starting to come back to Canada (DeepMind Edmonton and Montreal) 6 State of Computer Go before AlphaGo Summary of previous work, and of our course so far • Search - MCTS, quite strong • Simulations - OK, hard to improve • Knowledge • Good for move selection • Considered hopeless for position evaluation • Strength: about 3 handicap levels below best humans • Quick progress considered unlikely - see next slide 7 Online Betting Market - Computer Go Program Beats Human Champion • Online “play-money” betting market • Claim: A machine will be the best Go player in the world, sometime before the end of 2020 • Before AlphaGo, chances were Image source: considered low, and falling http://www.ideosphere.com/ • First Nature paper changed it fx-bin/Claim?claim=GoCh around completely 8 AlphaGo Versions and Publications • Worked on Go program for about 2 years, mostly kept secret • DCNN paper published in 2015 (see previous lecture) • Fall 2015 Match AlphaGo Fan vs European champion Fan Hui (2 dan professional) • AlphaGo won 5:0 in official games, lost some other games • Match kept secret until January 2016 • January 2016 Article in Nature describes this version of AlphaGo 9 AlphaGo Versions and Publications (continued) • March 2016 Match AlphaGo Lee vs top-level human player Lee Sedol, wins 4 - 1 • January 2017 AlphaGo Master plays 60 games on internet vs top humans, wins 60 - 0 • May 2017 Match vs world #1 Ke Jie, wins 3 - 0 AlphaGo retires from match play • October 2017 AlphaGo Zero article in Nature Learns without human game knowledge • December 2017/December 2018 Alpha Zero article preprint/final - chess and shogi 10 AlphaGo Project • AlphaGo project was “Big Science” • Dozens of developers • World experts in RL, game tree search and MCTS, neural nets, deep learning • Many millions of dollars in hardware and computing costs Image source: http://sports.sina. • Huge engineering effort com.cn/go/2016-12-19/ 11 AlphaGo vs Fan Hui • Fall 2015 - early AlphaGo version AlphaGo Fan Hui • Fan Hui: trained in China, 2 dan professional • Lives in Europe since 2000, French citizen • European champion 2013 - 2015 • AlphaGo beat Fan Hui by 5:0 in Image source: official games https://www.geekwire.com/2016/ • Fan Hui won some faster, informal games 12 AlphaGo Article in Nature • Nature and Science are the top two scientific journals • Papers on computing science are rare • Papers on games are very rare • Games articles from our dept: • Checkers Is Solved, Science 2007 • Heads-up limit hold’em poker is solved, Science 2015 • January 2016 AlphaGo on title page of Nature Image source: • Describes version that beat https://www.nature.com Fan Hui 13 AlphaGo Fan Design As described in January 2016 article in Nature • Search: MCTS (pretty standard, parallel MCTS) • Modified UCT combines simulation result and value network evaluation • Simulation (rollout) policy: relatively normal • Uses small, fast network for policy • Supervised Learning (SL) policy network • DCNN, similar to their 2015 paper • Learns move prediction from master games • Improved in details, more data 14 AlphaGo Fan Design (continued) • New: strong RL policy network • Learns by Reinforcement Learning (RL) from self-play • New: value network • Learns from labeled game data created by strong RL policy • Main contributions • Reinforcement learning for training DCNN policy net • Value network for state evaluation • Large, extremely well-engineered learning and playing system 15 Rollout Policy Network • Used for simulations in MCTS • Designed to be simple and fast • “Linear softmax of small pattern features” • Probabilistic move selection as in Coulom’s paper, Go4 • Weights trained by RL instead of MM • Slightly larger patterns/extended features • 24.2% move prediction rate • 2µs per move selection (500,000 simulated moves/second) 16 AlphaGo Fan Deep Network Architecture • Three “big” networks • SL policy network, RL policy network, RL value network • Same overall design for all three • 13-layer deep convolutional NN • 192 filters in convolutional layers • 4.8ms evaluation on GPU • About 200 Evaluations/second/GPU • More than 2000x slower than simulation policy evaluation • Can learn and represent much more complex information 17 Input Features for NN Image source: AlphaGo paper • Pretty similar input as previous nets 18 Value Network • Learn a state evaluation function • Given a Go position • Computes probability of winning • No search, no simulation! • Learns mapping • From state s • To expected outcome E[z] of the game • Network architecture mostly same as policy nets • Difference: output layer • Value net outputs single number: evaluation of s • Compare policy nets: output = probability distribution 19 AlphaGo Fan Learning Pipeline and Play • Training Pipeline • AlphaGo program • Performance measurements • Games against humans 20 AlphaGo Fan Training Pipeline Supervised Learning (SL) • Rollout policy • SL policy network Reinforcement Learning • RL policy network • Value network 21 Supervised Learning (SL) Policy Network • Trained from 30 million positions from the KGS Go Server • Maximize likelihood of human move • 57% move prediction when using all simple input features • 55.7% when using only raw board input • “Small improvements in accuracy led to large improvements in playing strength” • MCTS with this network is already much better than all previous Go programs 22 RL Policy Network • Learns from self play • Learning: adjust weights of net to learn from winner of each game • Exactly same network architecture as SL net • Weights initialized from SL net • Then plays many millions of games • Keeps a pool of previous networks as opponents to avoid overfitting 23 Rewards and RL Training Procedure • Goal: learn improved weights ρ for current network • Play full game • Reward z only at end • +1 for winning, -1 for losing • Update weights ρ by stochastic gradient ascent • Ascent: go in the direction that maximizes expected outcome of game • Formula from REINFORCE learning method (Williams 1992) 24 REINFORCE Update Formula REINFORCE update formula @ log p (a j s) ∆ρ = α ρ z @ρ • ∆ρ .. change in weight ρ • α step size for gradient ascent • pρ(a j s) .. probability of playing the move a which was played in the game from state s • z .. game result (+1 or -1) 25 RL Policy Network vs Other Go Programs • Tested RL Policy Network as a standalone player • No search, only one call to network • Plays move by sampling from pρ • Won 80% against SL policy net • Won 85% vs MCTS Go program pachi which used 100,000 simulations/move (!) 26 Training the Value Net • Value net computes a function vθ(s) • Function arguments: • input state s • Value net weights θ • Output: vθ(s) is a single real number • Training: minimize squared error between • Value net prediction vθ(s) • True game outcome zi X 2 Squared error = (vθ(si ) − zi ) i • Measure mean squared error (MSE): • Squared error / number of training items 27 Training Data for Value Net • Played 30 million self-play games using the RL policy player • Randomly sample one single state s from each game • Why only one state? Because of overfitting problems • Label s with the outcome z of the game • Result: 30 million training points (si ; zi ) • Learn the value net by trying to predict zi from si • This was called “reinforcement learning of value networks” in the paper • It is really supervised learning from game data • Game data was produced by the RL policy 28 MCTS in AlphaGo Fan Player • Player uses MCTS - search and simulations • Neural nets used as (very strong) in-tree knowledge • Policy net guides tree search • Value net helps evaluate leaf nodes • 3 values stored on each edge (s,a) of game tree • Winrate Q, visit count N, prior probability P from policy net 29 In-tree Move Selection • Choose move which maximizes Q + u • Winrate Q, exploitation term • u exploration term, with multiplicative knowledge • P Decay by u = C 1+N • Exploration constant C 30 Computing V (sL) • Value estimate V (sL) of leaf node sL • V (sL) is
Recommended publications
  • Elements of DSAI: Game Tree Search, Learning Architectures
    Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Elements of DSAI AlphaGo Part 2: Game Tree Search, Learning Architectures Search & Learn: A Recipe for AI Action Decisions J¨orgHoffmann Winter Term 2019/20 Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 1/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents? Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Single agent! Several koalas, several gorillas trying to beat these up. BUT there is only a single acting entity { one player decides which moves to take (who gets into the boat). Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 3/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents! Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Multi-agent competitive! TWO players deciding which moves to take. Conflicting interests. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 4/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Agenda: Game Search, AlphaGo Architecture Games: What is that? ! Game categories, game solutions. Game Search: How to solve a game? ! Searching the game tree. Evaluation Functions: How to evaluate a game position? ! Heuristic functions for games. AlphaGo: How does it work? ! Overview of AlphaGo architecture, and changes in Alpha(Go) Zero. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 5/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Positioning in the DSAI Phase Model Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 6/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Which Games? ! No chance element.
    [Show full text]
  • The Discovery of Chinese Logic Modern Chinese Philosophy
    The Discovery of Chinese Logic Modern Chinese Philosophy Edited by John Makeham, Australian National University VOLUME 1 The titles published in this series are listed at brill.nl/mcp. The Discovery of Chinese Logic By Joachim Kurtz LEIDEN • BOSTON 2011 This book is printed on acid-free paper. Library of Congress Cataloging-in-Publication Data Kurtz, Joachim. The discovery of Chinese logic / by Joachim Kurtz. p. cm. — (Modern Chinese philosophy, ISSN 1875-9386 ; v. 1) Includes bibliographical references and index. ISBN 978-90-04-17338-5 (hardback : alk. paper) 1. Logic—China—History. I. Title. II. Series. BC39.5.C47K87 2011 160.951—dc23 2011018902 ISSN 1875-9386 ISBN 978 90 04 17338 5 Copyright 2011 by Koninklijke Brill NV, Leiden, The Netherlands. Koninklijke Brill NV incorporates the imprints Brill, Global Oriental, Hotei Publishing, IDC Publishers, Martinus Nijhoff Publishers and VSP. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Authorization to photocopy items for internal or personal use is granted by Koninklijke Brill NV provided that the appropriate fees are paid directly to The Copyright Clearance Center, 222 Rosewood Drive, Suite 910, Danvers, MA 01923, USA. Fees are subject to change. CONTENTS List of Illustrations ...................................................................... vii List of Tables .............................................................................
    [Show full text]
  • ELF: an Extensive, Lightweight and Flexible Research Platform for Real-Time Strategy Games
    ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games Yuandong Tian1 Qucheng Gong1 Wenling Shang2 Yuxin Wu1 C. Lawrence Zitnick1 1Facebook AI Research 2Oculus 1fyuandong, qucheng, yuxinwu, [email protected] [email protected] Abstract In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environ- ments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a minia- ture version of StarCraft, captures key game dynamics and runs at 40K frame- per-second (FPS) per core on a laptop. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end- to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environ- ments like ALE [4]. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU [17] and Batch Normalization [11] cou- pled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than 70% of the time in the full game of Mini-RTS. Strong per- formance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open sourced at https://github.com/facebookresearch/ELF. 1 Introduction Game environments are commonly used for research in Reinforcement Learning (RL), i.e.
    [Show full text]
  • Improved Policy Networks for Computer Go
    Improved Policy Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, PARIS, FRANCE Abstract. Golois uses residual policy networks to play Go. Two improvements to these residual policy networks are proposed and tested. The first one is to use three output planes. The second one is to add Spatial Batch Normalization. 1 Introduction Deep Learning for the game of Go with convolutional neural networks has been ad- dressed by [2]. It has been further improved using larger networks [7, 10]. AlphaGo [9] combines Monte Carlo Tree Search with a policy and a value network. Residual Networks improve the training of very deep networks [4]. These networks can gain accuracy from considerably increased depth. On the ImageNet dataset a 152 layers networks achieves 3.57% error. It won the 1st place on the ILSVRC 2015 classifi- cation task. The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks. Residual networks were recently successfully adapted to computer Go [1]. As a follow up to this paper, we propose improvements to residual networks for computer Go. The second section details different proposed improvements to policy networks for computer Go, the third section gives experimental results, and the last section con- cludes. 2 Proposed Improvements We propose two improvements for policy networks. The first improvement is to use multiple output planes as in DarkForest. The second improvement is to use Spatial Batch Normalization. 2.1 Multiple Output Planes In DarkForest [10] training with multiple output planes containing the next three moves to play has been shown to improve the level of play of a usual policy network with 13 layers.
    [Show full text]
  • Unsupervised State Representation Learning in Atari
    Unsupervised State Representation Learning in Atari Ankesh Anand⇤ Evan Racah⇤ Sherjil Ozair⇤ Mila, Université de Montréal Mila, Université de Montréal Mila, Université de Montréal Microsoft Research Yoshua Bengio Marc-Alexandre Côté R Devon Hjelm Mila, Université de Montréal Microsoft Research Microsoft Research Mila, Université de Montréal Abstract State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and tem- porally distinct features of a neural encoder of the observations. We also in- troduce a new benchmark based on Atari 2600 games where we evaluate rep- resentations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we com- pare our technique with other state-of-the-art generative and contrastive repre- sentation learning methods. The code associated with this work is available at https://github.com/mila-iqia/atari-representation-learning 1 Introduction The ability to perceive and represent visual sensory data into useful and concise descriptions is con- sidered a fundamental cognitive capability in humans [1, 2], and thus crucial for building intelligent agents [3]. Representations that concisely capture the true state of the environment should empower agents to effectively transfer knowledge across different tasks in the environment, and enable learning with fewer interactions [4].
    [Show full text]
  • Chinese Health App Arrives Access to a Large Population Used to Sharing Data Could Give Icarbonx an Edge Over Rivals
    NEWS IN FOCUS ASTROPHYSICS Legendary CHEMISTRY Deceptive spice POLITICS Scientists spy ECOLOGY New Zealand Arecibo telescope faces molecule offers cautionary chance to green UK plans to kill off all uncertain future p.143 tale p.144 after Brexit p.145 invasive predators p.148 ICARBONX Jun Wang, founder of digital biotechnology firm iCarbonX, showcases the Meum app that will use reams of health data to provide customized medical advice. BIOTECHNOLOGY Chinese health app arrives Access to a large population used to sharing data could give iCarbonX an edge over rivals. BY DAVID CYRANOSKI, SHENZHEN medical advice directly to consumers through another $400 million had been invested in the an app. alliance members, but he declined to name the ne of China’s most intriguing biotech- The announcement was a long-anticipated source. Wang also demonstrated the smart- nology companies has fleshed out an debut for iCarbonX, which Wang founded phone app, called Meum after the Latin for earlier quixotic promise to use artificial in October 2015 shortly after he left his lead- ‘my’, that customers would use to enter data Ointelligence (AI) to revolutionize health care. ership position at China’s genomics pow- and receive advice. The Shenzhen firm iCarbonX has formed erhouse, BGI, also in Shenzhen. The firm As well as Google, IBM and various smaller an ambitious alliance with seven technology has now raised more than US$600 million in companies, such as Arivale of Seattle, Wash- companies from around the world that special- investment — this contrasts with the tens of ington, are working on similar technology. But ize in gathering different types of health-care millions that most of its rivals are thought Wang says that the iCarbonX alliance will be data, said the company’s founder, Jun Wang, to have invested (although several big play- able to collect data more cheaply and quickly.
    [Show full text]
  • CSC321 Lecture 23: Go
    CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 22 Final Exam Monday, April 24, 7-10pm A-O: NR 25 P-Z: ZZ VLAD Covers all lectures, tutorials, homeworks, and programming assignments 1/3 from the first half, 2/3 from the second half If there's a question on this lecture, it will be easy Emphasis on concepts covered in multiple of the above Similar in format and difficulty to the midterm, but about 3x longer Practice exams will be posted Roger Grosse CSC321 Lecture 23: Go 2 / 22 Overview Most of the problem domains we've discussed so far were natural application areas for deep learning (e.g. vision, language) We know they can be done on a neural architecture (i.e. the human brain) The predictions are inherently ambiguous, so we need to find statistical structure Board games are a classic AI domain which relied heavily on sophisticated search techniques with a little bit of machine learning Full observations, deterministic environment | why would we need uncertainty? This lecture is about AlphaGo, DeepMind's Go playing system which took the world by storm in 2016 by defeating the human Go champion Lee Sedol Roger Grosse CSC321 Lecture 23: Go 3 / 22 Overview Some milestones in computer game playing: 1949 | Claude Shannon proposes the idea of game tree search, explaining how games could be solved algorithmically in principle 1951 | Alan Turing writes a chess program that he executes by hand 1956 | Arthur Samuel writes a program that plays checkers better than he does 1968 | An algorithm defeats human novices at Go 1992
    [Show full text]
  • Move Prediction in Gomoku Using Deep Learning
    VW<RXWK$FDGHPLF$QQXDO&RQIHUHQFHRI&KLQHVH$VVRFLDWLRQRI$XWRPDWLRQ :XKDQ&KLQD1RYHPEHU Move Prediction in Gomoku Using Deep Learning Kun Shao, Dongbin Zhao, Zhentao Tang, and Yuanheng Zhu The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences Beijing 100190, China Emails: [email protected]; [email protected]; [email protected]; [email protected] Abstract—The Gomoku board game is a longstanding chal- lenge for artificial intelligence research. With the development of deep learning, move prediction can help to promote the intelli- gence of board game agents as proven in AlphaGo. Following this idea, we train deep convolutional neural networks by supervised learning to predict the moves made by expert Gomoku players from RenjuNet dataset. We put forward a number of deep neural networks with different architectures and different hyper- parameters to solve this problem. With only the board state as the input, the proposed deep convolutional neural networks are able to recognize some special features of Gomoku and select the most likely next move. The final neural network achieves the accuracy of move prediction of about 42% on the RenjuNet dataset, which reaches the level of expert Gomoku players. In addition, it is promising to generate strong Gomoku agents of human-level with the move prediction as a guide. Keywords—Gomoku; move prediction; deep learning; deep convo- lutional network Fig. 1. Example of positions from a game of Gomoku after 58 moves have passed. It can be seen that the white win the game at last. I.
    [Show full text]
  • Understanding & Generalizing Alphago Zero
    Under review as a conference paper at ICLR 2019 UNDERSTANDING &GENERALIZING ALPHAGO ZERO Anonymous authors Paper under double-blind review ABSTRACT AlphaGo Zero (AGZ) (Silver et al., 2017b) introduced a new tabula rasa rein- forcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game. This success naturally begs the question whether it is possible to develop similar high-performance reinforcement learning algorithms for generic sequential decision-making problems (beyond two-player games), using only the constraints of the environment as the “rules.” To address this challenge, we start by taking steps towards developing a formal understanding of AGZ. AGZ includes two key innovations: (1) it learns a policy (represented as a neural network) using super- vised learning with cross-entropy loss from samples generated via Monte-Carlo Tree Search (MCTS); (2) it uses self-play to learn without training data. We argue that the self-play in AGZ corresponds to learning a Nash equilibrium for the two-player game; and the supervised learning with MCTS is attempting to learn the policy corresponding to the Nash equilibrium, by establishing a novel bound on the difference between the expected return achieved by two policies in terms of the expected KL divergence (cross-entropy) of their induced distributions. To extend AGZ to generic sequential decision-making problems, we introduce a robust MDP framework, in which the agent and nature effectively play a zero-sum game: the agent aims to take actions to maximize reward while nature seeks state transitions, subject to the constraints of that environment, that minimize the agent’s reward.
    [Show full text]
  • Computer Go: from the Beginnings to Alphago Martin Müller, University of Alberta
    Computer Go: from the Beginnings to AlphaGo Martin Müller, University of Alberta 2017 Outline of the Talk ✤ Game of Go ✤ Short history - Computer Go from the beginnings to AlphaGo ✤ The science behind AlphaGo ✤ The legacy of AlphaGo The Game of Go Go ✤ Classic two-player board game ✤ Invented in China thousands of years ago ✤ Simple rules, complex strategy ✤ Played by millions ✤ Hundreds of top experts - professional players ✤ Until 2016, computers weaker than humans Go Rules ✤ Start with empty board ✤ Place stone of your own color ✤ Goal: surround empty points or opponent - capture ✤ Win: control more than half the board Final score, 9x9 board ✤ Komi: first player advantage Measuring Go Strength ✤ People in Europe and America use the traditional Japanese ranking system ✤ Kyu (student) and Dan (master) levels ✤ Separate Dan ranks for professional players ✤ Kyu grades go down from 30 (absolute beginner) to 1 (best) ✤ Dan grades go up from 1 (weakest) to about 6 ✤ There is also a numerical (Elo) system, e.g. 2500 = 5 Dan Short History of Computer Go Computer Go History - Beginnings ✤ 1960’s: initial ideas, designs on paper ✤ 1970’s: first serious program - Reitman & Wilcox ✤ Interviews with strong human players ✤ Try to build a model of human decision-making ✤ Level: “advanced beginner”, 15-20 kyu ✤ One game costs thousands of dollars in computer time 1980-89 The Arrival of PC ✤ From 1980: PC (personal computers) arrive ✤ Many people get cheap access to computers ✤ Many start writing Go programs ✤ First competitions, Computer Olympiad, Ing Cup ✤ Level 10-15 kyu 1990-2005: Slow Progress ✤ Slow progress, commercial successes ✤ 1990 Ing Cup in Beijing ✤ 1993 Ing Cup in Chengdu ✤ Top programs Handtalk (Prof.
    [Show full text]
  • Combining Tactical Search and Deep Learning in the Game of Go
    Combining tactical search and deep learning in the game of Go Tristan Cazenave PSL-Universite´ Paris-Dauphine, LAMSADE CNRS UMR 7243, Paris, France [email protected] Abstract Elaborated search algorithms have been developed to solve tactical problems in the game of Go such as capture problems In this paper we experiment with a Deep Convo- [Cazenave, 2003] or life and death problems [Kishimoto and lutional Neural Network for the game of Go. We Muller,¨ 2005]. In this paper we propose to combine tactical show that even if it leads to strong play, it has search algorithms with deep learning. weaknesses at tactical search. We propose to com- Other recent works combine symbolic and deep learn- bine tactical search with Deep Learning to improve ing approaches. For example in image surveillance systems Golois, the resulting Go program. A related work [Maynord et al., 2016] or in systems that combine reasoning is AlphaGo, it combines tactical search with Deep with visual processing [Aditya et al., 2015]. Learning giving as input to the network the results The next section presents our deep learning architecture. of ladders. We propose to extend this further to The third section presents tactical search in the game of Go. other kind of tactical search such as life and death The fourth section details experimental results. search. 2 Deep Learning 1 Introduction In the design of our network we follow previous work [Mad- Deep Learning has been recently used with a lot of success dison et al., 2014; Tian and Zhu, 2015]. Our network is fully in multiple different artificial intelligence tasks.
    [Show full text]
  • Computer Wins Second GAME Against Chinese GO Champion
    TECHNOLOGY SATURDAY, MAY 27, 2017 Samsung investigating Galaxy S8 ‘iris hack’ SEOUL: Samsung Electronics is investigating claims by a German hacking group that it fooled the iris recognition system of the new flagship Galaxy S8 device, the firm said yesterday. The launch of the Galaxy S8 was a key step for the world’s largest smartphone maker as it sought to move on from last year’s humiliating withdrawal of the fire-prone Galaxy Note 7s, which hammered the firm’s once-stellar reputation. But a video posted by the Chaos Computer Club (CCC), a German hacking group founded in 1981, shows the Galaxy S8 being unlocked using a printed photo of the owner’s eye covered with a contact lens to replicate the curvature of a real eyeball. “A high- resolution picture from the internet is sufficient to cap- ture an iris,” CCC spokesman Dirk Engling said, adding: “Ironically, we got the best results with laser printers made by Samsung.” A Samsung spokeswoman said it was aware of the report and was investigating. The iris scanning technolo- gy was “developed through rigorous testing”, the firm said in a statement as it sought to reassure customers. “If there is a potential vulnerability or the advent of a new method that challenges our efforts to ensure security at WUZHEN: Chinese Go player Ke Jie, left, looks at the board as a person makes a move on behalf of Google’s artificial any time, we will respond as quickly as possible to resolve intelligence program, AlphaGo, during a game of Go at the Future of Go Summit in Wuzhen in eastern China’s Zhejiang the issue.” Samsung’s hopes of competing against archri- Province.
    [Show full text]