Minimax Strategies, Alpha Beta Pruning

Minimax strategies, alpha beta pruning Lirong Xia “Sum to 2” game Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2 Player 1 0 1 Player 2 Player 2 0 1 0 1 Player 1 Player 1 Player 1 Player 1 0 1 0 1 0 1 0 1 -1 -1 -1 1 -1 1 1 -1 Player 1’s utility is in the leaves; player 2’s utility is the negative of this Today’s schedule ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm 5 Adversarial Games Ø Deterministic, zero-sum games: § Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result Ø Minimax search: § A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary 6 Computing Minimax Values Ø This is DFS Ø Two recursive functions: § max-value maxes the values of successors § min-value mins the values of successors Ø Def value (state): If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state) Ø Def max-value(state): Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max Ø Def min-value(state): similar to max-value 7 Minimax Example 3 3 2 2 8 Tic-tac-toe Game Tree 9 Renju • 15*15 • 5 horizontal, vertical, or diagonal in a row win • no double-3 or double-4 moves for black • otherwise black’s winning strategy was computed – L. Victor Allis 1994 (PhD thesis) 10 Minimax Properties Ø Time complexity? § Ob( m ) Ø Space complexity? § Obm( ) Ø For chess, § Exact solution is completely infeasible bm»»35, 100 § But, do we need to explore the whole tree? 11 Resource Limits Ø Cannot search to leaves Ø Depth-limited search § Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions Ø Guarantee of optimal play is gone 12 Evaluation Functions Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Evals s w f s w f s w f s ( ) = 1 1 ( ) + 2 2 ( ) ++ n n ( ) Ø e.g. , etc. fs1 ( ) = (# white queens - # black queens) 13 Minimax with limited depth ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d § at depth d, use a evaluation function to estimate the value if it is non-terminal 14 Improving minimax: pruning 15 Pruning in Minimax Search ØAn ancestor is a MAX node § already has an option than my current solution § my future solution can only be smaller 16 Alpha-beta pruning Ø Pruning = cutting off parts of the search tree (because you realize you don’t need to look at them) § When we considered A* we also pruned large parts of the search tree Ø Maintain § α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞ Ø Maintain and update α and β for each node § α is updated at MAX player’s nodes § β is updated at MIN player’s nodes Alpha-Beta Pruning Ø General configuration § We’re computing the MIN-VALUE at n § We’re looping over n’s children § n’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children § Define β similarly for MIN § α is usually smaller than β • Once α >= β, return to the upper layer 18 Alpha-Beta Pruning Example a is MAX’s best alternative here or above b is MIN’s best alternative here or above 19 Alpha-Beta Pruning Example a = -¥ starting ab/ b = +¥ raising a a = -¥ a = -¥ a = 3 a = 3 b = +¥ b = +¥ b = +¥ b = +¥ lowering b a = 3 a = 3 a = -¥ a = -¥a = -¥a = -¥ b = +¥ b = 2 a = 3 a = 3 a = 3 a = 3 b = +¥ b = 3 b = 3 b = 3 b = +¥ b =14 b = 5 b =1 raising a a = -¥ a = 8 a is MAX’s best alternative here or above b = 3 b = 3 b is MIN’s best alternative here or above 20 Alpha-Beta Pseudocode 21 Alpha-Beta Pruning Properties Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong! § Important: children of the root may have the wrong value Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”: § Time complexity drops to O(bm/2) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless… 22.

Minimax Strategies, Alpha Beta Pruning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support