<<

Minimax strategies, alpha beta pruning

Lirong Xia “Sum to 2” Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2

Player 1 0 1

Player 2 Player 2

0 1 0 1

Player 1 Player 1 Player 1 Player 1

0 1 0 1 0 1 0 1

-1 -1 -1 1 -1 1 1 -1 Player 1’s utility is in the leaves; player 2’s utility is the negative of this Today’s schedule

ØAdversarial game ØMinimax search ØAlpha-beta pruning

5 Adversarial Ø Deterministic, zero-sum games: § Tic-tac-toe, , checkers § The MAX player maximizes result § The MIN player minimizes result

Ø Minimax search: § A search § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary

6 Computing Minimax Values

Ø This is DFS

Ø Two recursive functions: § max-value maxes the values of successors § min-value mins the values of successors Ø Def value (state): If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state) Ø Def max-value(state): Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max Ø Def min-value(state): similar to max-value 7 Minimax Example

3

3 2 2

8 Tic-tac-toe

9 Renju • 15*15 • 5 horizontal, vertical, or diagonal in a row win • no double-3 or double-4 moves for black • otherwise black’s winning was computed – L. Victor Allis 1994 (PhD thesis) 10 Minimax Properties Ø Time complexity? § Ob( m ) Ø Space complexity? § Obm( ) Ø For chess, § Exact solution is completely infeasible bm»»35, 100 § But, do we need to explore the whole tree?

11 Resource Limits Ø Cannot search to leaves Ø Depth-limited search § Instead, search a limited depth of tree § Replace terminal utilities with an for non-terminal positions Ø Guarantee of optimal play is gone

12 Evaluation Functions Ø Functions which scores non-terminals

Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Evals s w f s w f s w f s ( ) = 1 1 ( ) + 2 2 ( ) ++ n n ( ) Ø e.g. , etc. fs1 ( ) = (# white queens - # black queens) 13 Minimax with limited depth

ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d § at depth d, use a evaluation function to estimate the value if it is non-terminal

14 Improving minimax: pruning

15 Pruning in Minimax Search

ØAn ancestor is a MAX node § already has an option than my current solution § my future solution can only be smaller 16 Alpha-beta pruning Ø Pruning = cutting off parts of the search tree (because you realize you don’t need to look at them) § When we considered A* we also pruned large parts of the search tree Ø Maintain § α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞ Ø Maintain and update α and β for each node § α is updated at MAX player’s nodes § β is updated at MIN player’s nodes Alpha-Beta Pruning

Ø General configuration § We’re computing the MIN-VALUE at n § We’re looping over n’s children § n’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children § Define β similarly for MIN § α is usually smaller than β • Once α >= β, return to the upper layer 18 Alpha-Beta Pruning Example

a is MAX’s best alternative here or above b is MIN’s best alternative here or above 19 Alpha-Beta Pruning Example a = -¥ starting ab / b = +¥ raising a a = -¥ a = -¥ a = 3 a = 3 b = +¥ b = +¥ b = +¥ b = +¥

lowering b a = 3 a = 3 a = -¥ a = -¥a = -¥a = -¥ b = +¥ b = 2 a = 3 a = 3 a = 3 a = 3 b = +¥ b = 3 b = 3 b = 3 b = +¥ b =14 b = 5 b =1

raising a a = -¥ a = 8 a is MAX’s best alternative here or above b = 3 b = 3 b is MIN’s best alternative here or above 20 Alpha-Beta Pseudocode

21 Alpha-Beta Pruning Properties Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong! § Important: children of the root may have the wrong value Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”: § Time complexity drops to O(bm/2) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless… 22