AI from a Game Player Perspective Eleonora Giunchiglia Why?
Total Page:16
File Type:pdf, Size:1020Kb
AI from a game player perspective Eleonora Giunchiglia Why? David Churchill, professor at Memorial University of Newfoundland: ``From a scientific point of view, the properties of StarCraft are very much like the properties of real life. [. .] We’re making a test bed for technologies we can use in the real world.’’ This concept can be extended to every game and justifies the research of AI in games. Connections with MAS A multiagent system is one composed of multiple interacting software components known as agents, which are typically capable of cooperating to solve problems that are beyond the abilities of any individual member. This represents by far a more complex setting than the traditional 1vs1 games à only in recent years they were able to study games with multiple agents. In this project we traced the path that led from AI applied to 1vs1 games to many vs. many games. The very first attempts The very first attempts were done even before the concept of Artificial intelligence was born: 1890: Leonardo Torres y Quevedo developed an electro-mechanical device, El Ajedrecista, to checkmate a human opponent’s king using only its own king and rook. 1948: Alan Turing wrote the algorithm TurboChamp. He never managed to run it on a real computer. The very first attempts 1950: Claude Shannon proposes the Minimax algorithm. Shannon proposed two different ways of deciding the next move: 1. doing brute-force tree search on the complete tree, and take the optimal move, or 2. looking at a small subset of next moves at each layer during tree search, and take the “likely optimal” move. Why Chess? Chess represented the ideal challenge because: discrete structure well defined problem zero-sum game with perfect information Markov’s property The Dartmouth Workshop Organized by Jonh McCarthy in 1956. Proposal: “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves...” Projects developed after the workshop: Arthur Samuel implemented an algorithm able to play Checkers. His algorithm was essentially based on Minimax, but with one key addition: learning. Nathaniel Rochester together the mathematician Alex Bernstein fully implemented the algorithm on the IBM 701. The program only achieved very basic chess play. Deep Blue Developed 40 years after the Darthmouth workshop Based on the Minimax algorithm Two versions, both played against the chess world champion Garry Kasparov: 1. 1996 Deep Blue: it lost with 1 win, 2 draws and 2 losses. 2. 1997 Deep Blue: it won with 2 wins, 3 draws and 1 loss. Reasons behind the victory Massively parallel designed hardware, able to examine 200 millions of chess positions per second 256 special purpose chips orchestrated by a 32- process supercomputer à each chip evaluated a position and find all legal moves from it in only one electronic flash Helped by chess players to write the library for the opening moves Finely designed evaluation function The evaluation function Values taken into account: Material Position King safety Tempo On the ground of these factors, the algorithm selected the move that returned the highest value among the ones belonging to the set of legal moves. The game Go Game rules: The game is played by two opponents taking turns Played on a 19 x 19 board Each player can either control black or white pieces In each turn one player can place one stone stones may be “captured” à the stone must be removed from the board The game Go Game aim: each player’s aim is to control as much of the board as possible Stop condition: the game proceeds until neither player wishes to make another move Game complexity: lower bound of legal moves in Go is 2x10170 First attempt to tackle Go 1962: H. Remus built a machine able to play a simplified version of Go à 11x11 board Idea: the initial moves were decided mainly with the random generator, but subsequently and increasingly with the heuristic computer, and eventually with the lexicon. Later advancements 1968: Alfred Zobrist managed to implement an algorithm able to beat a wholly inexperienced amateur à algorithm not based on tree search but on visual patterns 1993: Bernd Brugmann implemented a program based on tree search. Instead of having a complex evaluation function he simulated to play many random games and picked the move that led to the best outcome on average. Later advancements 2006: Rémi Coulom implemented CrazyStone, a program deploying Monte Carlo evaluation with tree search (MCTS) à winner of 2016 KGS computer-Go tournament for the 9 × 9 variant of Go. MCTS is based on many playouts à In each playout, the game is played out to the very end by selecting moves at random. MCTS: details Each round of Monte Carlo tree search consists of 4 steps: 1) Selection 2) Expansion 3) Simulation/ 4) Backpropagation playout Later advancements 2014: two unrelated groups of scientists 1. supervised by Maddison C. 2. supervised by Clark C. trained a CNN to play and published their results with only 2 weeks of distance between each other. Good results, but not enough to play against MCTS algorithms operating at full brute force capacity. AlphaGo Represents the result obtained by Maddison’s group à they followed the idea of using neural networks together with MCTS Basic intuition: Neural networks serve as intuition to recognize possible good moves MCTS serve as reasoning to give an evaluation about how good these moves actually are AlphaGo: characteristics Policy network: used to predict the best move for a given position à this network has been trained on a dataset containing 30 million position-move pairs from games played by humans + reinforcement learning to play against older versions of itself Rollout network: smaller network to be used in the rollout phase of MCTS à better than random but faster then using the policy network Value network: evaluate a position on the ground of the probability of winning the game from that position. AlphaGo: steps Every time AlphaGo does a move: 1. Policy net suggests promising moves to evaluate 2. These moves are evaluated through a combination of MCTS rollouts and the value network prediction AlphaGo: implementation AlphaGo could run 48 CPUs deploying 40 search threads, aided by 8 GPUs for neural net computations. AlphaGo was also implemented in a distributed version which could scale to more than a thousand CPUs and close to 200 GPUs. AlphaGo: results October 2015: AlphaGo won its first match against the reigning 3-times European Champion, Mr Fan Hui à first time a machine beat a Go professional player. March 2016: AlphaGo won with a score of 4 - 1 against Mr Lee Sedol, winner of 18 world titles and widely considered to be the greatest player last ten years. AI plays Atari games 2013: DeepMind published an article in which they presented a new way of training neural networks with reinforcement learning As a demonstration of effectiveness, they presented a neural network, called the Q-network, able to play 7 Atari 2600 games without any adjustment to the architecture or the learning algorithm. Atari games: challenges High dimensional input: the agent could only take the value of raw pixels as input (210 x 160 RGB video at 60Hz) No Markov property: the algorithm could not take decisions only on the ground of the current frame, but it had also to take into account the previous ones. Atari games: the Q-network DeepMind trained the Q-network with a variation of the Q-learning. The team made the Q-network play multiple consecutive games and minimize a loss function dependent on: a sequence of past actions performed by the neural network itself the current performed action Results in Atari Breakout After 10 minutes of training, the algorithm tries to hit the ball, but it is yet too clumsy to manage. After 120 minutes of training, the algorithm already plays at a good level. After 240 minutes of training, the algorithm realizes that digging a tunnel through the wall is the most effective technique to beat the game. Open challenges: StarCraft The game: in the full 1v1 The game: game of StarCraft II, two opponent races of aliens are The most common games are generated on a map which 1v1 played between human contains resources and other players, but team games are elements. possible as well (2v2, 3v3 or 4v4), as well as there exist more difficult games with unbalanced teams or more than two teams. In the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which contains resources and other elements. StarCraft: rules To win a game, a player must: 1. accumulate resources 2. build structures for the production 3. accumulate soldiers in order to build an army 4. destroy all of the enemy’s buildings. StarCraft: challenges Why is StarCraft a difficult game to be tackled? Multi-agent problem with multiple players interacting Imperfect information game Large action space involving the selection and control of hundreds of units Large state space observable only from raw input Need for long term plans StarCraft: agent inputs All the agents must take as input: minimap: a coarse representation of the state of the entire world, screen: a detailed view of a subsection of the world corresponding to the players on-screen view, non-spatial features: non-spatial features refer to information regarding the status of the game available to each player StarCraft: first agents The first proposed agents are: Atari-net Agent: it has been developed by adapting the architecture that had been used in the Atari paper FullyConv Agent: an agent that “predicts spatial actions directly through a sequence of resolution-preserving convolutional layers” FullyConv LSTM Agent: presents the advantages given by the FullyConv agent introducing an element of memory thanks to its recurrent structure.