The Game Is Not Over Yet—Go in the Post-Alphago Era
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril
Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril Michael Matheny, Sonoo Thadaney Israni, Mahnoor Ahmed, and Danielle Whicher, Editors WASHINGTON, DC NAM.EDU PREPUBLICATION COPY - Uncorrected Proofs NATIONAL ACADEMY OF MEDICINE • 500 Fifth Street, NW • WASHINGTON, DC 20001 NOTICE: This publication has undergone peer review according to procedures established by the National Academy of Medicine (NAM). Publication by the NAM worthy of public attention, but does not constitute endorsement of conclusions and recommendationssignifies that it is the by productthe NAM. of The a carefully views presented considered in processthis publication and is a contributionare those of individual contributors and do not represent formal consensus positions of the authors’ organizations; the NAM; or the National Academies of Sciences, Engineering, and Medicine. Library of Congress Cataloging-in-Publication Data to Come Copyright 2019 by the National Academy of Sciences. All rights reserved. Printed in the United States of America. Suggested citation: Matheny, M., S. Thadaney Israni, M. Ahmed, and D. Whicher, Editors. 2019. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. NAM Special Publication. Washington, DC: National Academy of Medicine. PREPUBLICATION COPY - Uncorrected Proofs “Knowing is not enough; we must apply. Willing is not enough; we must do.” --GOETHE PREPUBLICATION COPY - Uncorrected Proofs ABOUT THE NATIONAL ACADEMY OF MEDICINE The National Academy of Medicine is one of three Academies constituting the Nation- al Academies of Sciences, Engineering, and Medicine (the National Academies). The Na- tional Academies provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. -
Rules of Go It Would Be Suicidal (And Hence Illegal) for White to Play at E, the Board Is Initially Because the Block of Four White Empty
Scoring Example another liberty at D. Rules of Go It would be suicidal (and hence illegal) for white to play at E, The board is initially because the block of four white empty. stones would have no liberties. Could black play at E? It looks like a suicidal move, because Black plays first. the black stone would have no liberties, but it would occupy On a turn, either place a the white block’s last liberty at stone on a vacant inter- the same time; the move is legal and the white stones are section or pass (giving a captured. stone to the opponent to The black block F can never be keep as a prisoner). In the diagram above, white captured. It has two internal has 8 points in the center and liberties (eyes) at the points Stones may be captured 7 points at the upper right. marked G. To capture the by tightly surrounding Two white stones (shown be- block, white would have to oc- low the board) are prisoners. cupy both of them, but either them. Captured stones White’s score is 8 + 7 − 2 = 13. move would be suicidal and are taken off the board Black has 3 points at the up- therefore illegal. and kept as prisoners. per left and 9 at the lower Suppose white plays at H, cap- right. One black stone is a turing the black stone at I. prisoner. Black’s score is (Notice that H is not an eye.) It is illegal to make a − 3 + 9 1 = 11. White wins. -
AI Computer Wraps up 4-1 Victory Against Human Champion Nature Reports from Alphago's Victory in Seoul
The Go Files: AI computer wraps up 4-1 victory against human champion Nature reports from AlphaGo's victory in Seoul. Tanguy Chouard 15 March 2016 SEOUL, SOUTH KOREA Google DeepMind Lee Sedol, who has lost 4-1 to AlphaGo. Tanguy Chouard, an editor with Nature, saw Google-DeepMind’s AI system AlphaGo defeat a human professional for the first time last year at the ancient board game Go. This week, he is watching top professional Lee Sedol take on AlphaGo, in Seoul, for a $1 million prize. It’s all over at the Four Seasons Hotel in Seoul, where this morning AlphaGo wrapped up a 4-1 victory over Lee Sedol — incidentally, earning itself and its creators an honorary '9-dan professional' degree from the Korean Baduk Association. After winning the first three games, Google-DeepMind's computer looked impregnable. But the last two games may have revealed some weaknesses in its makeup. Game four totally changed the Go world’s view on AlphaGo’s dominance because it made it clear that the computer can 'bug' — or at least play very poor moves when on the losing side. It was obvious that Lee felt under much less pressure than in game three. And he adopted a different style, one based on taking large amounts of territory early on rather than immediately going for ‘street fighting’ such as making threats to capture stones. This style – called ‘amashi’ – seems to have paid off, because on move 78, Lee produced a play that somehow slipped under AlphaGo’s radar. David Silver, a scientist at DeepMind who's been leading the development of AlphaGo, said the program estimated its probability as 1 in 10,000. -
W2go4e-Book.Pdf
American Go Association The AGA is dedicated to promotion of the game of go in America. It works to encourage people to learn more about and enjoy this remarkable game and to strengthen the U.S. go playing community. The AGA: • Publishes the American Go e-Journal, free to everyone with Legal Note: The Way To Go is a copyrighted work. special weekly editions for members Permission is granted to make complete copies for • Publishes the American Go Journal Yearbook – free to members personal use. Copies may be distributed freely to • Sanctions and promotes AGA-rated tournaments others either in print or electronic form, provided • Maintains a nationwide rating system no fee is charged for distribution and all copies contain • Organizes the annual U.S. Go Congress and Championship this copyright notice. • Organizes the summer U.S. Go Camp for children • Organizes the annual U.S. Youth Go Championship • Manages U.S. participation in international go events Information on these services and much more is available at the AGA’s website at www.usgo.org. E R I C M A N A American Go Association G Box 397 Old Chelsea Station F O O N U I O New York, NY 10113 N D A T http://www.usgo.org American Go Foundation The American Go Foundation is a 501(c)(3) charitable organiza- tion devoted to the promotion of go in the United States. With our help thousands of youth have learned go from hundreds of teachers. Cover print: Two Immortals and the Woodcutter Our outreach includes go related educational and cultural activities A watercolor by Seikan. -
How to Play Go Lesson 1: Introduction to Go
How To Play Go Lesson 1: Introduction To Go 1.1 About The Game Of Go Go is an ancient game originated from China, with a definite history of over 3000 years, although there are historians who say that the game was invented more than 4000 years ago. The Chinese call the game Weiqi. Other names for Go include Baduk (Korean), Igo (Japanese) and Goe (Taiwanese). This game is getting increasingly popular around the world, especially in Asian, European and American countries, with many worldwide competitions being held. Diagram 1-1 The game of Go is played on a board as shown in Diagram 1-1. The Go set comprises of the board, together with 180 black and white stones each. Diagram 1-1 shows the standard 19x19 board (i.e. the board has 19 lines by 19 lines), but there are 13x13 and 9x9 boards in play. However, the 9x9 and 13x13 boards are usually for beginners; more advanced players would prefer the traditional 19x19 board. Compared to International Chess and Chinese Chess, Go has far fewer rules. Yet this allowed for all sorts of moves to be played, so Go can be a more intellectually challenging game than the other two types of Chess. Nonetheless, Go is not a difficult game to learn, so have a fun time playing the game with your friends. Several rule sets exist and are commonly used throughout the world. Two of the most common ones are Chinese rules and Japanese rules. Another is Ing's rules. All the rules are basically the same, the only significant difference is in the way counting of territories is done when the game ends. -
Reinforcement Learning (1) Key Concepts & Algorithms
Winter 2020 CSC 594 Topics in AI: Advanced Deep Learning 5. Deep Reinforcement Learning (1) Key Concepts & Algorithms (Most content adapted from OpenAI ‘Spinning Up’) 1 Noriko Tomuro Reinforcement Learning (RL) • Reinforcement Learning (RL) is a type of Machine Learning where an agent learns to achieve a goal by interacting with the environment -- trial and error. • RL is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. • The purpose of RL is to learn an optimal policy that maximizes the return for the sequences of agent’s actions (i.e., optimal policy). https://en.wikipedia.org/wiki/Reinforcement_learning 2 • RL have recently enjoyed a wide variety of successes, e.g. – Robot controlling in simulation as well as in the real world Strategy games such as • AlphaGO (by Google DeepMind) • Atari games https://en.wikipedia.org/wiki/Reinforcement_learning 3 Deep Reinforcement Learning (DRL) • A policy is essentially a function, which maps the agent’s each action to the expected return or reward. Deep Reinforcement Learning (DRL) uses deep neural networks for the function (and other components). https://spinningup.openai.com/en/latest/spinningup/rl_intro.html 4 5 Some Key Concepts and Terminology 1. States and Observations – A state s is a complete description of the state of the world. For now, we can think of states belonging in the environment. – An observation o is a partial description of a state, which may omit information. – A state could be fully or partially observable to the agent. If partial, the agent forms an internal state (or state estimation). https://spinningup.openai.com/en/latest/spinningup/rl_intro.html 6 • States and Observations (cont.) – In deep RL, we almost always represent states and observations by a real-valued vector, matrix, or higher-order tensor. -
Heft 1/2018 93. Jahrgang
Heft 1/2018 93. Jahrgang 1 DGoZ 1/2018 Inhalt Vorwort Seit langer Zeit hat die DGoZ mal wieder 64+4 Niederländisches Go-Werbeplakat aus den Seiten, also ihre praktisch maximale Dicke. Das 1980er Jahren ......................................Titel liegt einerseits daran, dass die Spielabendliste erst Vorwort, Inhalt ...............................................2 in dieser Ausgabe abgedruckt ist und allein schon Nachrichten ........................................... 2–6 7,5 Seiten lang ist, andererseits an der Fülle von Ausschreibung: DJGM ............................... 7 eingesendeten Berichten, die nur schwer hätten auf Turnierberichte .......................................... 8 spätere Ausgaben verschoben werden können – oder Lee Hajin: Zulassungsgespräch ................... 9 dies schon wurden. Viele lesenswerte Artikel! EGC-2018-Helfersuche ......................10–11 Mit dem Jubiläums-Kidocup wirft ein großes Ein Go-Kongress im Januar? ............... 12–13 Go-Ereignis seinen Schatten voraus, was auch an 28. Internationales Paar -Go-Turnier ........ 14 der DGoZ nicht spurlos vorbei geht: Neben einer Hwang Inseongs Winter-Go-Camp .... 15–16 Ankündigung im Nachrichtenteil hat Yoon Young Interview mit Hwang Inseong ............ 16–17 Das Seidenstraßenturnier .................... 18–19 Sun zwei Partien unserer Kidocup-Gäste Lee Chang- Die EGF Academy .............................. 20–21 ho 9p und Choi Jeong 9p kommentiert. Spannend! Anfängerprobleme ..............................22–23 Besonders hinweisen möchte ich noch auf den Hil- -
OLIVAW: Mastering Othello with Neither Humans Nor a Penny Antonio Norelli, Alessandro Panconesi Dept
1 OLIVAW: Mastering Othello with neither Humans nor a Penny Antonio Norelli, Alessandro Panconesi Dept. of Computer Science, Universita` La Sapienza, Rome, Italy Abstract—We introduce OLIVAW, an AI Othello player adopt- of companies. Another aspect of the same problem is the ing the design principles of the famous AlphaGo series. The main amount of training needed. AlphaGo Zero required 4.9 million motivation behind OLIVAW was to attain exceptional competence games played during self-play. While to attain the level of in a non-trivial board game at a tiny fraction of the cost of its illustrious predecessors. In this paper, we show how the grandmaster for games like Starcraft II and Dota 2 the training AlphaGo Zero’s paradigm can be successfully applied to the required 200 years and more than 10,000 years of gameplay, popular game of Othello using only commodity hardware and respectively [7], [8]. free cloud services. While being simpler than Chess or Go, Thus one of the major problems to emerge in the wake Othello maintains a considerable search space and difficulty of these breakthroughs is whether comparable results can be in evaluating board positions. To achieve this result, OLIVAW implements some improvements inspired by recent works to attained at a much lower cost– computational and financial– accelerate the standard AlphaGo Zero learning process. The and with just commodity hardware. In this paper we take main modification implies doubling the positions collected per a small step in this direction, by showing that AlphaGo game during the training phase, by including also positions not Zero’s successful paradigm can be replicated for the game played but largely explored by the agent. -
Gradient Play in N-Cluster Games with Zero-Order Information
Gradient Play in n-Cluster Games with Zero-Order Information Tatiana Tatarenko, Jan Zimmermann, Jurgen¨ Adamy Abstract— We study a distributed approach for seeking a models, each agent intends to find a strategy to achieve a Nash equilibrium in n-cluster games with strictly monotone Nash equilibrium in the resulting n-cluster game, which is mappings. Each player within each cluster has access to the a stable state that minimizes the cluster’s cost functions in current value of her own smooth local cost function estimated by a zero-order oracle at some query point. We assume the response to the actions of the agents from other clusters. agents to be able to communicate with their neighbors in Continuous time algorithms for the distributed Nash equi- the same cluster over some undirected graph. The goal of libria seeking problem in multi-cluster games were proposed the agents in the cluster is to minimize their collective cost. in [17]–[19]. The paper [17] solves an unconstrained multi- This cost depends, however, on actions of agents from other cluster game by using gradient-based algorithms, whereas the clusters. Thus, a game between the clusters is to be solved. We present a distributed gradient play algorithm for determining works [18] and [19] propose a gradient-free algorithm, based a Nash equilibrium in this game. The algorithm takes into on zero-order information, for seeking Nash and generalized account the communication settings and zero-order information Nash equilibria respectively. In discrete time domain, the under consideration. We prove almost sure convergence of this work [5] presents a leader-follower based algorithm, which algorithm to a Nash equilibrium given appropriate estimations can solve unconstrained multi-cluster games in linear time. -
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Datacenter Performance Analysis of a Tensor Processing Unit Presented by Josh Fried Background: Machine Learning Neural Networks: ● Multi Layer Perceptrons ● Recurrent Neural Networks (mostly LSTMs) ● Convolutional Neural Networks Synapse - each edge, has a weight Neuron - each node, sums weights and uses non-linear activation function over sum Propagating inputs through a layer of the NN is a matrix multiplication followed by an activation Background: Machine Learning Two phases: ● Training (offline) ○ relaxed deadlines ○ large batches to amortize costs of loading weights from DRAM ○ well suited to GPUs ○ Usually uses floating points ● Inference (online) ○ strict deadlines: 7-10ms at Google for some workloads ■ limited possibility for batching because of deadlines ○ Facebook uses CPUs for inference (last class) ○ Can use lower precision integers (faster/smaller/more efficient) ML Workloads @ Google 90% of ML workload time at Google spent on MLPs and LSTMs, despite broader focus on CNNs RankBrain (search) Inception (image classification), Google Translate AlphaGo (and others) Background: Hardware Trends End of Moore’s Law & Dennard Scaling ● Moore - transistor density is doubling every two years ● Dennard - power stays proportional to chip area as transistors shrink Machine Learning causing a huge growth in demand for compute ● 2006: Excess CPU capacity in datacenters is enough ● 2013: Projected 3 minutes per-day per-user of speech recognition ○ will require doubling datacenter compute capacity! Google’s Answer: Custom ASIC Goal: Build a chip that improves cost-performance for NN inference What are the main costs? Capital Costs Operational Costs (power bill!) TPU (V1) Design Goals Short design-deployment cycle: ~15 months! Plugs in to PCIe slot on existing servers Accelerates matrix multiplication operations Uses 8-bit integer operations instead of floating point How does the TPU work? CISC instructions, issued by host. -
CSC321 Lecture 23: Go
CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 22 Final Exam Monday, April 24, 7-10pm A-O: NR 25 P-Z: ZZ VLAD Covers all lectures, tutorials, homeworks, and programming assignments 1/3 from the first half, 2/3 from the second half If there's a question on this lecture, it will be easy Emphasis on concepts covered in multiple of the above Similar in format and difficulty to the midterm, but about 3x longer Practice exams will be posted Roger Grosse CSC321 Lecture 23: Go 2 / 22 Overview Most of the problem domains we've discussed so far were natural application areas for deep learning (e.g. vision, language) We know they can be done on a neural architecture (i.e. the human brain) The predictions are inherently ambiguous, so we need to find statistical structure Board games are a classic AI domain which relied heavily on sophisticated search techniques with a little bit of machine learning Full observations, deterministic environment | why would we need uncertainty? This lecture is about AlphaGo, DeepMind's Go playing system which took the world by storm in 2016 by defeating the human Go champion Lee Sedol Roger Grosse CSC321 Lecture 23: Go 3 / 22 Overview Some milestones in computer game playing: 1949 | Claude Shannon proposes the idea of game tree search, explaining how games could be solved algorithmically in principle 1951 | Alan Turing writes a chess program that he executes by hand 1956 | Arthur Samuel writes a program that plays checkers better than he does 1968 | An algorithm defeats human novices at Go 1992 -
Residual Networks for Computer Go Tristan Cazenave
Residual Networks for Computer Go Tristan Cazenave To cite this version: Tristan Cazenave. Residual Networks for Computer Go. IEEE Transactions on Games, Institute of Electrical and Electronics Engineers, 2018, 10 (1), 10.1109/TCIAIG.2017.2681042. hal-02098330 HAL Id: hal-02098330 https://hal.archives-ouvertes.fr/hal-02098330 Submitted on 12 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TCIAIG 1 Residual Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, 75016 PARIS, FRANCE Deep Learning for the game of Go recently had a tremendous success with the victory of AlphaGo against Lee Sedol in March 2016. We propose to use residual networks so as to improve the training of a policy network for computer Go. Training is faster than with usual convolutional networks and residual networks achieve high accuracy on our test set and a 4 dan level. Index Terms—Deep Learning, Computer Go, Residual Networks. I. INTRODUCTION Input EEP Learning for the game of Go with convolutional D neural networks has been addressed by Clark and Storkey [1]. It has been further improved by using larger networks [2].