On to Games Game Playing Ch. 5.1-5.3, 5.4.1, 5.5 • Tail end of Constraint Satisfaction • Game playing Questions • Framework from reading? • Game trees We’ve seen search problems • Minimax where other agents’ moves • Alpha-beta pruning need to be taken into account – but what if they are • Adding randomness actively moving against us?

Cynthia Matuszek – CMSC 671 1 Based on slides by Marie desJardin, Francisco Iacobelli 34

1 34

Why Games? State-of-the-art “A computer can’t be intelligent; one could never beat a human at ____” • Clear criteria for success • Chess: • Deep Blue beat Gary Kasparov in 1997 • Offer an opportunity to study problems involving • Garry Kasparav vs. Deep Junior (Feb 2003): tie! {hostile / adversarial / competing} agents. • Kasparov vs. X3D Fritz (November 2003): tie! • Interesting, hard problems which require minimal • Deep Fritz beat world champion Vladimir Kramnik (2006) setup • Now computers play computers • Often define very large search spaces • Checkers: “Chinook” (sigh), an AI program with a • chess 35100 nodes in search tree, 1040 legal states very large endgame database, is world champion, can provably never be beaten. Retired 1995. • Many problems can be formalized as games

35 36

35 36

AlphaGo Master defeated by three to zero during its 60 straight wins in the State-of-the-art online games at the end of 2016 and beginning of 2017.

• Bridge: “Expert-level” AI, but no world champions • “computer bridge world champion Jack played seven top Dutch pairs … and two reigning European champions. • A total of 196 boards were played. Jack defeated three out of the seven pairs (including the Europeans). Overall, the program lost by a small margin (359 versus 385).” (2006) • Bridge is stochastic: the computer has imperfect information. “A computer can’t be intelligent; one • Go could never beat a human at ____”

37 wikipedia: Computer_bridge www.wired.com/2017/05/googles-alphago-levels-board-games-power-grids

37 38

1 State-of-the-art: Go Typical Games

• Computers finally got there: AlphaGo! • 2-person game • Made by Google DeepMind in London • Players alternate moves • 2015: Beat a professional Go player without handicaps • Easiest games are: • 2016: Beat a 9-dan professional without handicaps • Zero-sum: one player’s loss is the other’s gain • Fully observable: both players have access to complete • 2017: Beat Ke Jie, #1 human player information about the state of the game. • 2017: DeepMind published AlphaGo Zero • Deterministic: No chance (e.g., dice) involved • No human games data • Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello • Learns from playing itself • Better than AlphaGo in 3 days of playing • Not: Bridge, Solitaire, Backgammon, ... 39 44

39 44

How to Play (How to Search) How to Play (How to Search)

• Obvious approach: • Key problems: • From current game state: • Representing the “board” (game state) 1. Consider all the legal moves you can make • We’ve seen that there are different ways to make these choices 2. Compute new position resulting from each move • Generating all legal next boards 3. Evaluate each resulting position • That can get ugly 4. Decide which is best • Evaluating a position 5. Make that move 6. Wait for your opponent to move 7. Repeat x1 x2 x3 x4 x1 x2 x3 x4

45 46

45 46

Evaluation Function Evaluation Function: The Idea

• Evaluation function or static evaluator is used to • I am always trying to reach the highest value evaluate the “goodness” of a game position (state) • You are always trying to reach the lowest value • Zero-sum assumption allows one evaluation function to describe goodness of a board for both • Captures everyone’s goal in a single function players • f (n) >> 0: position n good for me and bad for you • : position bad for me and good for you • One player’s gain of n means the other loses n f (n) << 0 n • ε • How? f (n) = 0± : position n is a neutral position • f (n) = +∞: win for me • f (n) = -∞: win for you

47 48

47 48

2 Evaluation Function Examples Evaluation function examples

• Example of an evaluation function for Tic-Tac-Toe: • Most evaluation functions are specified as a • f (n) = [#3-lengths open for ×] - [#3-lengths open for O] weighted sum of position features:

• A 3-length is a complete row, column, or diagonal • f (n) = w1 * feat1(n) + w2 * feat2(n) + ... + wn * featk(n) • Alan Turing’s function for chess • Example features for chess: piece count, piece • f(n) = w(n)/b(n) placement, squares controlled, … • w(n) = sum of the point value of white’s pieces • Deep Blue had over 8000 square control, rook-in-file, x- • b(n) = sum of black’s rays, king safety, pawn structure, features in its nonlinear passed pawns, ray control, evaluation function! outposts, pawn majority, rook on the 7th blockade, restraint, trapped pieces, color complex, ... 49 50

49 50

Evaluation Function: the Idea Game trees I am maximizing f(n) on my turn

• I am always trying to reach the highest value • Problem spaces for typical games are • You are always trying to reach the lowest value represented as trees • Captures everyone’s goal in a single function • Player must decide Opponent is minimizing f(n) • f (n) >> 0: position n good for me and bad for you best single move to make next on their turn • f (n) << 0: position n bad for me and good for you • f (n) = 0±ε : position n is a neutral position • Root node = current • f (n) = +∞: win for me board configuration • f (n) = -∞: win for you • Arcs = possible legal moves for a player 52 53

52 53

Game trees Minimax Procedure

• Static evaluator function • Create start node: MAX node, current board state • Rates a board position • Expand nodes down to a depth of lookahead • f (board) = R, with f >0 for me, f <0 for you • Apply evaluation function at each leaf node • If it is my turn to move: • “Back up” values for each non-leaf node until a • Root is labeled “MAX” node value is computed for the root node • Otherwise it is a “MIN” node • MIN: backed-up value is of children’s values (opponent’s turn) lowest • MAX: backed-up value is highest of children’s values • Each level’s nodes are all MAX or all MIN • Pick operator associated with the child node whose • Nodes at level i are opposite those at level i +1 backed-up value set the value at the root 54 55

54 55

3 lookahead = 3 max Minimax Algorithm min 2

2 1 2 1

2 7 1 8 2 7 1 8 2 7 1 8 2

Static evaluator value 2 1 Can only choose MAX “best” move up to lookahead MIN 2 7 1 8 https://www.youtube.com/watch?v=6ELUvkSkCts

56 57

Example: Nim Partial Game Tree for Tic-Tac-Toe

• In Nim, there are a certain number of objects (coins, sticks, etc.) on the table – we’ll play 7-coin Nim • Each player in turn has to pick up either one or two objects • Whoever picks up the last object loses • f (n) = +1 if position is a win for X. • f (n) = -1 if position is a win for O. • f (n) = 0 if position is a draw.

58 59

Minimax Tree Nim Game Tree

MAX node • In-class exercise: • Draw minimax search tree for 4-coin Nim MIN node • Things to consider: • What’s your start state? • What’s the maximum depth of the tree? Minimum? • Pick up either one or two objects

value computed • Whoever picks up the last object loses f value by minimax 61

60 61

4 Nim Game Tree Games 2 Player 1 wins: +1 Player 2 wins: -1 4 3 2

Expectiminimax 2 1 1 0

Alpha-beta Pruning 1 0 0 0

0

62 63

62 63

Nim Game Tree Nim Game Tree

Player 1 wins: +1 Player 1 wins: +1 4 -1 Player 2 wins: -1 Player 2 wins: -1 4 3 2 -1 3 -1 2 2 1 1 0 1 1 2 -1 1 -1 1 0 1

1 0 -1 0 -1 0 -1 1 1 0 -1 0 -1 0 -1

1 0 1 0

64 65

64 65

Improving Minimax Alpha-Beta Pruning

• Basic problem: must examine a number of states • We can improve on the performance of the that is exponential in d ! minimax algorithm through alpha-beta pruning • Basic idea: • “If you have an idea that is surely bad, don't take the Solution: judicious pruning time to see how truly awful it is.” – Pat Winston of the search tree • We don’t need to compute ≤ 2 • “Cut off ” whole sections that MAX the value at this node. can’t be part of the best solution • No matter what it is, it can’t MIN = 2 ≤ 1 affect the value of the root • Or, sometimes, probably won’t node. • Can be a completeness vs. efficiency tradeoff, esp. in MAX • Because the MAX player stochastic problem spaces will choose this value. 2 7 1 ? 67

66 67

5 Alpha-Beta Pruning Alpha-beta Example (b=3)

• Traverse search tree in depth-first order MAX 3 • At each MAX node n, α(n) = maximum value found so far • At each MIN node n, β(n) = minimum value found so far 3 2 - prune 14 1 - prune • α starts at -∞ and increases, β starts at +∞ and decreases MIN • β-cutoff: Given a MAX node n, • Cut off search below n (i.e., don’t look at any more of n’s children) if: • α(n) ≥ β(i) for some MIN node ancestor i of n

• α-cutoff: 3 12 8 2 14 1 • Stop searching below MIN node n if: • β(n) ≤ α(i) for some MAX node ancestor i of n 68 69

68 69

Alpha-Beta Pruning Alpha-Beta Pruning: Exercise

MAX

MIN

MAX

70 71

70 71

Effectiveness of Alpha-Beta Games of Chance

• Alpha-beta is guaranteed to: • Backgammon: 2-player with • Compute the same value for the root node as minimax uncertainty • With ≤ computation • Players roll dice to determine what moves to make • Worst case: nothing pruned • Examine bd leaf nodes • White has just rolled 5 and 6 • Each node has b children and a d-ply search is performed and has four legal moves: • 5-10, 5-11 d/2 • Best case: examine only (2b) leaf nodes. • 5-11, 19-24 • So you can search twice as deep as minimax! • 5-10, 10-16 • When each player’s best move is the first alternative generated • 5-11, 11-16 • In Deep Blue, empirically, alpha-beta pruning took • Good for decision making in adversarial problems with skill average branching factor from ~35 to ~6! and luck 73 74

73 74

6 Game Trees with Chance Game Trees with Chance

• Chance nodes (circles) • Use minimax to represent random events compute values for MAX and MIN nodes Min • For a random event Rolls with N outcomes: • Use expected values for chance nodes • Chance node has N distinct children • Over a max node, as in C: • Each has a probability Max expectimax(C) = Rolls • Example: ∑i(P(di) * maxvalue(i)) • Rolling 2 dice à 21 distinct outcomes • Over a min node: • Not all equally likely! expectimin(C) = ∑i(P(di) * minvalue(i))

75 76

75 76

Game Trees with Chance Meaning of the Evaluation Function

A1 = best A2 = best move move 2 outcomes, P= {.9, .1}

• Dealing with probabilities and expected values means being careful with “meaning” of values returned by the static evaluator • “Relative-order preserving” (as here) change won’t change minimax, but could change the decision with chance nodes

77 78

Exercise: Oopsy-Nim

• Starts out like Nim • Each player in turn has to pick up either one or two objects • Sometimes (probability = 0.25), when you try to pick up two objects, you drop them both • Picking up a single object always works

• Question: Why can’t we draw the entire game tree? • Exercise: Draw the 4-ply game tree (2 moves per player)

79

7