CS 7180: Behavioral Modeling and Decision-making in AI

Game Tree Search

Prof. Amy Sliva September 28, 2012 Outline

• Introduction to games • State-space representation of game tree • Adversarial search • search • Alpha-beta pruning Decisions in mulagent environments

• Automated planning from last time assumes single agent • Decisions depend only on the state-space, possible actions, and the desired goal

• Multiagent domains—strategic decisions are necessary • Consider actions of other agents when making a decision

• Competitive multiagent environments often called games • Goals of agents are in conlict • Often some cooperation involved, but we will get to this later…

• Formulate decision making as a state-space search problem • Adversarial search What do we mean by “games”?

Deterministic Chance

Chess, Checkers, Backgammon, Go, Othello Monopoly

Bridge, Poker, Battleship, Imperfect Information Kriegspiel Scrabble, Nuclear war

• Not typically considered—physical games like tennis, croquet, ice hockey, etc. • Well, sometimes… Robot Soccer!

• RoboCup http://www.robocup.org/

• The goal:

“By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the oficial rule of the FIFA, against the winner of the most recent World Cup.”

• We’ll stick with games like and checkers today… Common subset of games for AI

Deterministic Chance

Chess, Checkers, Backgammon, Perfect Information Go, Othello Monopoly

Bridge, Poker, Battleship, Imperfect Information Kriegspiel Scrabble, Nuclear war

• Deterministic, turn-taking, two-player, zero-sum, perfect- information Determinisc, turn-taking, zero-sum games

• Deterministic • No chance (e.g., using dice) involved

• Turn taking • Two agents whose actions alternate

• Zero-sum • Utility values for each agent are equal and opposite of the other (i.e., if one agent wins, the other necessarily loses) • Creates the adversarial situation

• Perfect information • Fully observable environments • No information is hidden from either player

• Generalizes to stochastic games, multiple players, non zero-sum, etc. Why study these games in AI?

• Fun! • Clear criteria for success • Interesting hard problems • Very large search spaces (i.e., chess has average 10134 nodes!) • Problems involving hostile, adversarial, or competing agents • Insight into other real-world applications with strategic decisions • Like the real world, games require a decision be made, even if calculating optimal decision is impossible • Different from games studied in game-theory How to play a game

1. Consider all legal moves—each move leads to a new state

2. Evaluate the possible next states and determine the best 3. Make your move! 4. Wait for opponent to move and do it all again • Represent the game as a tree, and this becomes a search problem Game trees

• Root node is the current state • Looking for the best single move to make next • Evaluation function f(s) rates each possible state • Edges are the legal moves for a player • Terminal nodes represent end-game states (i.e., win, lose, or draw) Game trees

• Root node is the current state • Looking for the best single move to make next • Evaluation function f(s) rates each possible state • Edges are the legal moves for a player • Terminal nodes represent end-game states (i.e., win, lose, or draw)

How do we search this tree to ind the optimal move? Planning vs. adversarial search

• Planning—no adversary • Solution is (heuristic) a method for inding goal (i.e., a plan) • Heuristic techniques can ind optimal solution • Evaluation function—estimate of cost from start to goal through given node

• Games—mutliagent and adversarial • Solution is a • Strategy speciies our move for every possible opponent reply • Time limits force an approximate solution • Evaluation function—evaluate “goodness” or utility of game position Formulang games as search

• Two players: MAX and MIN—MAX moves irst • Take turns until game is over • Winner gets reward, loser gets penalty • Zero sum—sum of the reward and the penalty is a constant

• Formal deinition • Initial state: Set-up speciied by the rules, e.g., initial board coniguration of chess • Player(s): Deines which player has the move in a state • Actions(s): Returns the set of legal moves in a state • Result(s,a): Transition model deines the result of a move • Terminal-Test(s): Is the game inished? True if inished, false otherwise • Utility function(s,p): Gives numerical value of terminal state s for player p • E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe. • E.g., win (+1), lose (0), and draw (1/2) in chess.

• MAX uses search tree to determine next move, assuming MIN is playing optimally Game trees for MAX and MIN

• Current player searches from perspective of MAX • If it is our turn to move, then the root is a MAX node • Each level of the tree has nodes that are all MAX or all MIN • Alternate MAX and MIN level

• Complete game tree • All conigurations generated by legal moves from root to the end

• Incomplete game tree • All conigurations generated by legal moves from root to a given depth (look ahead some number of steps) Evaluang a game state

• Evaluation function f(n) evaluates the "goodness" of a game position (i.e., the utility to each player) • Contrast with heuristic search—evaluation function estimates cost/ distance to the goal

• Zero-sum assumption allows us to use one evaluation function for both players • f(n) > 0: position n good for MAX and bad for MIN • f(n) < 0: position n bad for MAX and good for MIN • f(n) near 0: position n is a neutral position • f(n) >> 0: win for MAX! • f(n) << 0: win for MIN!

• Goal of game tree search—determine one move for MAX that maximizes the payoff for a given game tree according to f(n) • Regardless of the moves the MIN will take Evaluaon funcon for Tic-Tac-Toe

f(n) = +1 if the position is a win for X. f(n) = -1 if the position is a win for O. f(n) = 0 if the position is a draw.

Minimax evaluaon rule

• Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the maximum minimax value, and MIN a state with minimum • Assumes worst case scenario that MIN is also trying to get his highest utility/payoff (i.e., is trying to minimize payoff to MAX)

Minimax(s) = Utility(s) if Terminal-Test(s)

maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN

Minimax evaluaon rule

• Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX)

Minimax(s) = Utility(s) if Terminal-Test(s)

maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN

Minimax evaluaon rule

• Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) at a MAX node is maximum of Minimax(s) = the child nodes Utility(s) if Terminal-Test(s)

maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN

Minimax evaluaon rule

• Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) at a MAX node is maximum of Minimax(s) = the child nodes Utility(s) if Terminal-Test(s)

maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN Minimax(s) at a MIN node is minimum of the child nodes Minimax game tree

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 • Terminal nodes show the utility values for MAX • Other nodes are labeled with the minimax values

• MAX’s best move is a1—leads to state with highest minimax

• MIN’s best reply is b1—leads to state with lowest minimax Minimax algorithm for opmal play

Find the optimal strategy for MAX:

1. Generate the whole game tree, down to the leaves 2. Apply utility (payoff) function to each leaf 3. Recursively back up values from leaves: a) a Max node computes the max of its child values b) a Min node computes the min of its child values 4. At root choose the move leading to the child with highest minimax value Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Minimax algorithm in acon

Best move

MAX 3 A

a1 a3 a2

MIN 322BCD

b1 b3 c1 c3 d1 d3 b2 c2 d2

3 12 8 2 4 6 14 5 2 Properes of the Minimax Algorithm

• Complete if tree is inite • Produces optimal strategy (against an optimal opponent) • Cannot be beaten by an opponent playing suboptimally • Depth-irst search through entire game tree • Maximum depth of tree is m and there are b moves at each point: • Running time O(bm) to search entire tree • Impractical for real games—remember, search only gives one move! • Basis for more practical game tree search algorithms

• Optimizations • Heuristics—limited depth or iterative deepening search • Pruning to eliminate unnecessary nodes from consideration More about the size of game trees

• Tic-Tac-Toe • b ≈ 5 legal actions per state on average, total of 9 plies in game (ply = one action by one player, move = two plies) • 59 = 1,953,125 • 9! = 362,880 searches if computer goes irst • 8! = 40,320 if computer goes second  exact solution quite reasonable

• Chess • b ≈ 35 (approximate average branching factor) • m ≈ 100 (depth of game tree for “typical” game) • bm ≈ 35100 ≈ 10154 nodes!!  exact solution completely infeasible

• It is usually impossible to develop the whole search tree. Heurisc evaluaon funcons

• Minimax has to expand the entire game tree at each move • Heuristic to evaluate the utility of a state without going all the way to leaf • Evaluate how good it is for MAX, how good it is for MIN, then subtract MIN’s score from MAX • Examples: Othello: f(n) = Number of white pieces - Number of black pieces • Chess: f(n) = Value of all white pieces - Value of all black pieces

• If the evaluation is X for one player, it’s -X for the opponent • Zero-sum game Evaluaon funcons for chess

Black to move White to move White slightly better Black winning • For chess f(n) is typically a linear weighted sum of features

f(n) = w1f1(n) + w2f2(n) + … + wnfn(n)

e.g., w1 = 9 with w1f1(n) = (number of white queens) - (number of black queens) Cung off search

• Almost identical to Minimax, but with a few changes: • Terminal-Test(n) replaced by Cutoff(n)—true if n is at cut off depth • Replace Utility(n) with evaluation function f(n)

Cutting off : 1. Expand tree to cut off depth 2. For each “leaf” node n use f(n) to compute the value 3. Recursively back up values from leaves: a) a Max node computes the max of its child values b) a Min node computes the min of its child values 4. At root choose the move leading to the highest value Does cung off search work in pracce?

• Remember the typical size of a chess game tree: • b = 35, m = 100, bm ≈ 35100 ≈ 10154

• Let’s aim for a more reasonable running time bm ≈ 106, b = 35, m = 4 à search down 4 levels of the tree

4-ply (2 move) look ahead is a pretty hopeless chess player!

4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov Exploing the fact of an adversary for pruning

“If you have an idea that is surely bad, don’t take the time to ind out how truly awful it is.” — Pat Winston

• Even cut off search still looks at unnecessary nodes

• If a position is provably bad • No use expending search time to ind out exactly HOW bad

• If an adversary can force a bad position (i.e., make a good move for them) • No use expending search time to ind out the good positions that the adversary won’t let you achieve anyway

• Bad = not better than we already know we can achieve elsewhere Not all nodes need to be explored

≥ 2

=2 ≤ 1

2 7 1 ? Not all nodes need to be explored

≥ 2 We don’t need to compute the value at this node

=2 ≤ 1

2 7 1 ? Not all nodes need to be explored

≥ 2 We don’t need to compute the value at this node

=2 ≤ 1 No matter what it is, it can’t affect the value of the root node

2 7 1 ? General rules for pruning

• Consider a node n in the tree

• If player has a better choice at • Parent node of n… • …or any choice point further up

• Then n will never be reached in play

• When that much is known about n, it can be pruned Alpha-beta pruning algorithm

• Traverse the tree in depth-irst order

• At each MAX node n α(n) = maximum value found so far (initially, α = −∞) • Increases if child of n returns value greater than current a • Serves as (tentative) lower bound for inal payoff

• At each MIN node β = minimum value found so far (initially, β = +∞) • Decreases if child of n return value less than current β • Serves as (tentative) upper bound on inal payoff

• Pass current values of α and β to child nodes during search

• Prune remaining branches at a node when α ≥ β When to prune using alpha-beta

• Two types of pruning, depending on MAX or MIN node

• Alpha cutoff • If at MAX node n, cutoff the search below n if α ≥ β (α increases and passes β from below)

• Beta cutoff • If at a MIN node n, cutoff the search below n if β ≤ α (β decreases and passes α from above)

Example of alpha-beta pruning

• Do depth-irst search until irst leaf

α, β initial values α = −∞ β = +∞ α, β passed down

α = −∞ β = +∞ Example of alpha-beta pruning

α = −∞ MIN updates β β = +∞ based on child

α = −∞ β = 3

3 Example of alpha-beta pruning

α = −∞ MIN updates β β = +∞ based on child

No change

α = −∞ β = 3

3 12 Example of alpha-beta pruning

MAX updates α based on child α = 3 β = +∞

3 3 is returned as MIN node value

3 12 8 Example of alpha-beta pruning

α = 3 β = +∞

α, β passed down

3 α = 3 β = +∞

3 12 8 Example of alpha-beta pruning

α = 3 β = +∞

MIN updates β based on child 3 α = 3 β = 2

3 12 8 2 Example of alpha-beta pruning

• Beta cutoff—β decreases and passes α from above

α = 3 β = +∞

α ≥ β so prune! 3 α = 3 β = 2

3 12 8 2 Example of alpha-beta pruning MAX updates α based on child

No change α = 3 β = +∞

2 is returned as MIN node value 3 ≤ 2

3 12 8 2 Example of alpha-beta pruning

α = 3 β = +∞ α, β passed down

3 ≤ 2 α = 3 β = +∞

3 12 8 2 Example of alpha-beta pruning

α = 3 β = +∞ MIN updates β based on child

3 ≤ 2 α = 3 β = 14

3 12 8 2 14 Example of alpha-beta pruning

α = 3 β = +∞ MIN updates β based on child

3 ≤ 2 α = 3 β = 5

3 12 8 2 14 5 Example of alpha-beta pruning

α = 3 β = +∞ 2 is returned as MIN node value

3 ≤ 2 2

3 12 8 2 14 5 2 Example of alpha-beta pruning

MAX calculates the same node value and makes the same move! 3

≤ 2 2

3 12 8 2 14 5 2 Effecveness of alpha-beta pruning

• Worst-Case • Branches ordered s.t. no pruning takes place—no improvement

• Best-Case • Each player’s best move is the left-most child (i.e., evaluated irst) • In practice, performance is closer to best E.g., Sort moves by the remembered move values found last time E.g., Use iterative deepening search, sort by value last iteration

• In practice often get O(b(m/2)) rather than O(bm) • This is the same as having a branching factor of sqrt(b), • (sqrt(b))m = b(m/2) • E.g., in chess go from b ≈ 35 to b ≈ 6 • Permits much deeper search in the same amount of time Final thoughts on alpha-beta pruning

• Pruning does not affect inal results—same as Minimax

• Entire subtrees can be pruned • Prune a node n and all children

• Good move ordering improves effectiveness of pruning

• Repeated states are again possible • Store them in memory in a transposition table—reuse values already computed State of play

• Checkers • Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994 • “Solved” in 2007—complete game tree from start to end

• Chess • Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997

• Othello • Human champions refuse to compete against computers—they are too good! • Go • Human champions refuse to compete against computers—they are too bad • b > 300 (!)