Game Playing Ch

On to Games Game Playing Ch. 5.1-5.3, 5.4.1, 5.5 • Tail end of Constraint Satisfaction • Game playing Questions • Framework from reading? • Game trees We’ve seen search problems • Minimax where other agents’ moves • Alpha-beta pruning need to be taken into account – but what if they are • Adding randomness actively moving against us? Cynthia Matuszek – CMSC 671 1 Based on slides by Marie desJardin, Francisco Iacobelli 34 1 34 Why Games? State-of-the-art “A computer can’t be intelligent; one could never beat a human at ____” • Clear criteria for success • Chess: • Deep Blue beat Gary Kasparov in 1997 • Offer an opportunity to study problems involving • Garry Kasparav vs. Deep Junior (Feb 2003): tie! {hostile / adversarial / competing} agents. • Kasparov vs. X3D Fritz (November 2003): tie! • Interesting, hard problems which require minimal • Deep Fritz beat world champion Vladimir Kramnik (2006) setup • Now computers play computers • Often define very large search spaces • Checkers: “Chinook” (sigh), an AI program with a • chess 35100 nodes in search tree, 1040 legal states very large endgame database, is world champion, can provably never be beaten. Retired 1995. • Many problems can be formalized as games 35 36 35 36 AlphaGo Master defeated Ke Jie by three to zero during its 60 straight wins in the State-of-the-art online games at the end of 2016 and beginning of 2017. • Bridge: “Expert-level” AI, but no world champions • “computer bridge world champion Jack played seven top Dutch pairs … and two reigning European champions. • A total of 196 boards were played. Jack defeated three out of the seven pairs (including the Europeans). Overall, the program lost by a small margin (359 versus 385).” (2006) • Bridge is stochastic: the computer has imperfect information. “A computer can’t be intelligent; one • Go could never beat a human at ____” 37 wikipedia: Computer_bridge www.wired.com/2017/05/googles-alphago-levels-board-games-power-grids 37 38 1 State-of-the-art: Go Typical Games • Computers finally got there: AlphaGo! • 2-person game • Made by Google DeepMind in London • Players alternate moves • 2015: Beat a professional Go player without handicaps • Easiest games are: • 2016: Beat a 9-dan professional without handicaps • Zero-sum: one player’s loss is the other’s gain • Fully observable: both players have access to complete • 2017: Beat Ke Jie, #1 human player information about the state of the game. • 2017: DeepMind published AlphaGo Zero • Deterministic: No chance (e.g., dice) involved • No human games data • Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello • Learns from playing itself • Better than AlphaGo in 3 days of playing • Not: Bridge, Solitaire, Backgammon, ... 39 44 39 44 How to Play (How to Search) How to Play (How to Search) • Obvious approach: • Key problems: • From current game state: • Representing the “board” (game state) 1. Consider all the legal moves you can make • We’ve seen that there are different ways to make these choices 2. Compute new position resulting from each move • Generating all legal next boards 3. Evaluate each resulting position • That can get ugly 4. Decide which is best • Evaluating a position 5. Make that move 6. Wait for your opponent to move 7. Repeat x1 x2 x3 x4 x1 x2 x3 x4 45 46 45 46 Evaluation Function Evaluation Function: The Idea • Evaluation function or static evaluator is used to • I am always trying to reach the highest value evaluate the “goodness” of a game position (state) • You are always trying to reach the lowest value • Zero-sum assumption allows one evaluation function to describe goodness of a board for both • Captures everyone’s goal in a single function players • f (n) >> 0: position n good for me and bad for you • : position bad for me and good for you • One player’s gain of n means the other loses n f (n) << 0 n • ε • How? f (n) = 0± : position n is a neutral position • f (n) = +∞: win for me • f (n) = -∞: win for you 47 48 47 48 2 Evaluation Function Examples Evaluation function examples • Example of an evaluation function for Tic-Tac-Toe: • Most evaluation functions are specified as a • f (n) = [#3-lengths open for ×] - [#3-lengths open for O] weighted sum of position features: • A 3-length is a complete row, column, or diagonal • f (n) = w1 * feat1(n) + w2 * feat2(n) + ... + wn * featk(n) • Alan Turing’s function for chess • Example features for chess: piece count, piece • f(n) = w(n)/b(n) placement, squares controlled, … • w(n) = sum of the point value of white’s pieces • Deep Blue had over 8000 square control, rook-in-file, x- • b(n) = sum of black’s rays, king safety, pawn structure, features in its nonlinear passed pawns, ray control, evaluation function! outposts, pawn majority, rook on the 7th blockade, restraint, trapped pieces, color complex, ... 49 50 49 50 Evaluation Function: the Idea Game trees I am maximizing f(n) on my turn • I am always trying to reach the highest value • Problem spaces for typical games are • You are always trying to reach the lowest value represented as trees • Captures everyone’s goal in a single function • Player must decide Opponent is minimizing f(n) • f (n) >> 0: position n good for me and bad for you best single move to make next on their turn • f (n) << 0: position n bad for me and good for you • f (n) = 0±ε : position n is a neutral position • Root node = current • f (n) = +∞: win for me board configuration • f (n) = -∞: win for you • Arcs = possible legal moves for a player 52 53 52 53 Game trees Minimax Procedure • Static evaluator function • Create start node: MAX node, current board state • Rates a board position • Expand nodes down to a depth of lookahead • f (board) = R, with f >0 for me, f <0 for you • Apply evaluation function at each leaf node • If it is my turn to move: • “Back up” values for each non-leaf node until a • Root is labeled “MAX” node value is computed for the root node • Otherwise it is a “MIN” node • MIN: backed-up value is of children’s values (opponent’s turn) lowest • MAX: backed-up value is highest of children’s values • Each level’s nodes are all MAX or all MIN • Pick operator associated with the child node whose • Nodes at level i are opposite those at level i +1 backed-up value set the value at the root 54 55 54 55 3 lookahead = 3 max Minimax Algorithm min 2 2 1 2 1 2 7 1 8 2 7 1 8 2 7 1 8 2 Static evaluator value 2 1 Can only choose MAX “best” move up to lookahead MIN 2 7 1 8 https://www.youtube.com/watch?v=6ELUvkSkCts 56 57 Example: Nim Partial Game Tree for Tic-Tac-Toe • In Nim, there are a certain number of objects (coins, sticks, etc.) on the table – we’ll play 7-coin Nim • Each player in turn has to pick up either one or two objects • Whoever picks up the last object loses • f (n) = +1 if position is a win for X. • f (n) = -1 if position is a win for O. • f (n) = 0 if position is a draw. 58 59 Minimax Tree Nim Game Tree MAX node • In-class exercise: • Draw minimax search tree for 4-coin Nim MIN node • Things to consider: • What’s your start state? • What’s the maximum depth of the tree? Minimum? • Pick up either one or two objects value computed • Whoever picks up the last object loses f value by minimax 61 60 61 4 Nim Game Tree Games 2 Player 1 wins: +1 Player 2 wins: -1 4 3 2 Expectiminimax 2 1 1 0 Alpha-beta Pruning 1 0 0 0 0 62 63 62 63 Nim Game Tree Nim Game Tree Player 1 wins: +1 Player 1 wins: +1 4 -1 Player 2 wins: -1 Player 2 wins: -1 4 3 2 -1 3 -1 2 2 1 1 0 1 1 2 -1 1 -1 1 0 1 1 0 -1 0 -1 0 -1 1 1 0 -1 0 -1 0 -1 1 0 1 0 64 65 64 65 Improving Minimax Alpha-Beta Pruning • Basic problem: must examine a number of states • We can improve on the performance of the that is exponential in d ! minimax algorithm through alpha-beta pruning • Basic idea: • “If you have an idea that is surely bad, don't take the Solution: judicious pruning time to see how truly awful it is.” – Pat Winston of the search tree • We don’t need to compute ≤ 2 • “Cut off ” whole sections that MAX the value at this node. can’t be part of the best solution • No matter what it is, it can’t MIN = 2 ≤ 1 affect the value of the root • Or, sometimes, probably won’t node. • Can be a completeness vs. efficiency tradeoff, esp. in MAX • Because the MAX player stochastic problem spaces will choose this value. 2 7 1 ? 67 66 67 5 Alpha-Beta Pruning Alpha-beta Example (b=3) • Traverse search tree in depth-first order MAX 3 • At each MAX node n, α(n) = maximum value found so far • At each MIN node n, β(n) = minimum value found so far 3 2 - prune 14 1 - prune • α starts at -∞ and increases, β starts at +∞ and decreases MIN • β-cutoff: Given a MAX node n, • Cut off search below n (i.e., don’t look at any more of n’s children) if: • α(n) ≥ β(i) for some MIN node ancestor i of n • α-cutoff: 3 12 8 2 14 1 • Stop searching below MIN node n if: • β(n) ≤ α(i) for some MAX node ancestor i of n 68 69 68 69 Alpha-Beta Pruning Alpha-Beta Pruning: Exercise MAX MIN MAX 70 71 70 71 Effectiveness of Alpha-Beta Games of Chance • Alpha-beta is guaranteed to: • Backgammon: 2-player with • Compute the same value for the root node as minimax uncertainty • With ≤ computation • Players roll dice to determine what moves to make • Worst case: nothing pruned • Examine bd leaf nodes • White has just rolled 5 and 6 • Each node has b children and a d-ply search is performed and has four legal moves: • 5-10, 5-11 d/2 • Best case: examine only (2b) leaf nodes.

Game Playing Ch

Chinese Health App Arrives Access to a Large Population Used to Sharing Data Could Give Icarbonx an Edge Over Rivals

Fml-Based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go

When Are We Done with Games?

About Go, the Ancient Game in Which AI Bested a Master 10 March 2016, by Youkyung Lee

Mastering the Game of Go Without Human Knowledge

Mastering the Game of Go Without Human Knowledge

FML-Based Prediction Agent and Its Application to Game of Go

Alphago Program, What New Ideas from Sedol Do Foresee?

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Arxiv:1712.01815V1 [Cs.AI] 5 Dec 2017

How the Computer Beat the Go Player

Deep Reinforcement Learning: a Brief Introduction

Algorithmic Business & the Pursuit of Digital Happiness