How the Computer Beat the Go Player

CONSCIOUSNESS REDUX MACHINE LEARNING How the Computer Beat the Go Player As a leading go player falls to a machine, artificial intelligence takes a decisive step on the road to overtaking the natural variety God moves the player, he in turn, the piece. But what god beyond God begins the round of dust and time and sleep and agonies? —Jorge Luis Borges, from “Chess,” 1960 The victory in March of the computer program AlphaGo over one of the world’s top handful of go players marks the high- est accomplishment to date for the bur- era is over, and a new one has begun. The reigning world chess champion Garry geoning field of machine learning and methods underlying AlphaGo, and its Kasparov. In a six-game match played in intelligence. The computer beat Lee Se- recent victory, have startling implications 1996, Kaspa rov prevailed against Deep dol at go, a very old and traditional board for the future of machine intelligence. Blue by three wins, two draws and one game, at a highly publicized tournament loss but lost a year later in a historic re- in Seoul in a 4–1 rout. With this defeat, Coming Out of Nowhere match 3.5 to 2.5. (Scoring rules permit ) computers have bettered people in the The ascent of AlphaGo to the top of the half points in the case of a draw.) last of the classical board games, this one go world has been stunning and quite dis- Chess is a classic game of strategy, Koch known for its depth and simplicity. An tinct from the trajectory of machines similar to tic-tac-toe (noughts and cross- CABE ( playing chess. Over a period of more than es), checkers (draughts), Reversi (Othel- C a decade a dedicated team of hardware lo), backgammon and go, in which play- BY CHRISTOF KOCH and software engineers hired by IBM ers take turns placing or moving pieces. ); SEAN M built and programmed a special-purpose Unlike games where players see only Christof Koch is president supercomputer named Deep Blue that did their own cards and all discarded cards, illustration and chief scientific officer one thing and one thing only: play chess players have full access to relevant infor- of the Allen Institute for Brain by evaluating 200 million board positions mation, with chance playing no role. Science in Seattle. He serves on Scientific American Mind’ s per second. In a widely expected develop- The rules of go are considerably sim- board of advisers. ment, the IBM team challenged then pler than those of chess. Black and White RICHIE POPE ( 20 SCIENTIFIC AMERICAN MIND JULY/AUGUst 2016 Exploring the riddle of our existence sides each have access to a bowl of black Its software was developed by a 20-per- ing the number of moves for a particular and white stones, and each places one in son team under erstwhile chess child board position. It does so by learning to turn on a 19-by-19 grid. Once placed, prodigy and neuroscientist turned AI pi- choose a small range of good moves for stones do not move. The intent of the oneer Demis Hassabis. (His London- that position. A “value network” then es- game, originating in China more than based DeepMind Technologies was ac- timates how likely a given board position 2,500 years ago, is to completely sur- quired in 2014 by Google.) Most intrigu- will lead to a win without chasing down round opposite stones. Such encircled ingly, the Nature article revealed that every node of the search tree. The policy stones are considered captured and are AlphaGo had played against the winner network generates possible moves that the removed from the board. Out of this of the European go championship, Fan value network then judges on their likeli- sheer simplicity, great beauty arises— Hui, in October 2015 and won 5 to 0 hood to vanquish the opponent. These are complex battles between Black and without handicapping the human player, processed using a technique called a Mon- White armies that span from the corners an unheard-of event. What is noteworthy te Carlo tree search, which can lead to op- to the center of the board. is that AlphaGo’s algorithms do not con- timal behavior even if only a tiny fraction Strictly logical games, such as chess tain any genuinely novel insights or of the complete game tree is explored. and go, can be characterized by how breakthroughs. The software combines A Monte Carlo tree search by itself many possible positions can arise—a good old-fashioned neural network algo- was not good enough for these programs measure that defines their complexity. rithms and machine-learning techniques to compete at the world-class level. That Depending on the phase of the game, players must pick one out of a small number of possible moves. A typical chess game may have 10120 possible moves, a THE ALPHAGO SOFTWARE IMPROVED BY huge number, considering there are only PLAYING CEASELESSLY AGAINST ITSELF. about 1080 atoms in the entire observable universe of galaxies, stars, planets, dogs, trees, people. But go’s complexity is much bigger—at 10360 possible moves. This is a with superb software engineering run- required giving AlphaGo the ability to number beyond imagination and renders ning on powerful but fairly standard learn, initially by exposing it to previous- any thought of exhaustively evaluating hardware—48 central processing units ly played games of professional go play- all possible moves utterly unrealistic. (CPUs) augmented by eight graphics pro- ers and subsequently by enabling the pro- Given this virtually illimitable com- cessing units (GPUs) developed to render gram to play millions of games against plexity, go is, much more than chess, 3-D graphics for the gaming communities itself, continuously improving its per about recognizing patterns that arise and exquisitely powered for running cer- formance in the process. when clutches of stones surround empty tain mathematical operations. In the first stage, a 13-layer policy spaces. Players perceive, consciously or At the heart of the computations are neural network started as a blank slate— not, relationships among groups of neural networks, distant descendants of with no prior exposure to go. It was then stones and talk about such seemingly neuronal circuits operating in biological trained on 30 million board positions fuzzy concepts as “light” and “heavy” brains. Multiple layers of artificial neu- from 160,000 real-life games taken from shapes of stones and aji, meaning latent rons process the input—the positions of a go database. That number represents possibilities. Such concepts, however, are stones on the 19-by-19 go board—and far more games than any professional much harder to capture algorithmically derive increasingly more abstract repre- player would encounter in a lifetime. than the formal rules of the game. Ac- sentations of various aspects of the game Each board position was paired with the cordingly, computer go programs strug- using something called convolutional actual move chosen by the player (which gled compared with their chess counter- networks. This same technology has is why this technique is called supervised parts, and none had ever beat a profes- made possible recent breakout perfor- learning), and the connections among the sional human under regular tournament mances in automatic image recogni- simulated neurons in the network were conditions. Such an event was prognos- tion—labeling, for example, all images adjusted using so-called deep-machine- ticated to be at least a decade away. posted to Facebook. learning techniques to make the network And then AlphaGo burst into public For any particular board position, two more likely to pick the better move the consciousness via an article in one of the neural networks operate in tandem to op- next time. The network was then tested world’s most respected science maga- timize performance. A “policy network” by giving it a board position from a game zines, Nature, on January 28 of this year. reduces the breadth of the game by limit- it had previously never seen. It accurate- MIND.SCIENTIFICAMERICAN.COM SCIENTIFIC AMERICAN MIND 21 CONSCIOUSNESS REDUX ly, though far from perfectly, predicted ball, Stargunner, Robot Tank, Road In a third and final stage of training, the move that the professional player Runner, Pong, Space Invaders, Ms. Pac- the value network that estimates how had picked. Man, Alien and Montezuma’s Revenge. likely a given board position will lead to In a second stage, the policy network (It was a sign of things to come: atari is a win is trained using 30 million self- trained itself using reinforcement learn- a Japanese go term, signifying the immi- generated positions that the policy net- ing. This technique is a lasting legacy of nent capture of one or more stones.) work chose. It is this feature of self-play, behaviorism—a school of thought domi- Each time it played, the DeepMind impossible for humans to replicate (be- nant in psychology and biology in the first network “saw” the same video-game cause it would require the player’s mind half of the 20th century. It professes the screen, including the current score, that to split itself into two independent idea that organisms—from worms, flies any human player would see. The net- “minds”) that enables the algorithm to and sea slugs to rats and people—learn by work’s output was a command to the joy- relentlessly improve. relating a particular action to specific stick to move the cursor on the screen. A peculiarity of AlphaGo is that it stimuli that preceded it. As they do this Following the diktat of the programmer will pick a strategy that maximizes over and over again, the organisms build to maximize the game score, the algo- the probability of winning regardless up an association between stimulus and rithm did so and figured out the rules of of by how much.

How the Computer Beat the Go Player

Chinese Health App Arrives Access to a Large Population Used to Sharing Data Could Give Icarbonx an Edge Over Rivals

CSC321 Lecture 23: Go

Fml-Based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go

Residual Networks for Computer Go Tristan Cazenave

Achieving Master Level Play in 9X9 Computer Go

Computer Go: from the Beginnings to Alphago Martin Müller, University of Alberta

Reinforcement Learning of Local Shape in the Game of Go

Weiqi, Baduk): a Beautiful Game

Learning to Play the Game of Go

Human Vs. Computer Go: Review and Prospect

Computer Go: an AI Oriented Survey

When Are We Done with Games?