Adversarial Search (Game)

Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani “Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline } Game as a search problem } Minimax algorithm } �-� Pruning: ignoring a portion of the search tree } Time limit problem } Cut off & Evaluation function 2 Games as search problems } Games } Adversarial search problems (goals are in conflict) } Competitive multi-agent environments } Games in AI are a specialized kind of games (in the game theory) 3 Adversarial Games 4 Types of Games } Many different kinds of games! } Axes: } Deterministic or stochastic? } One, two, or more players? } Zero sum? } Perfect information (can you see the state)? } Want algorithms for calculating a strategy (policy) which recommends a move from each state 5 Zero-Sum Games } Zero-Sum Games } General Games } Agents have opposite utilities } Agents have independent utilities (values on outcomes) (values on outcomes) } Lets us think of a single value that one } Cooperation, indifference, competition, maximizes and the other minimizes and more are all possible } Adversarial, pure competition } More later on non-zero-sum games 6 Primary assumptions } We start with these games: } Two -player } Tu r n taking } agents act alternately } Zero-sum } agents’ goals are in conflict: sum of utility values at the end of the game is zero or constant } Perfect information } fully observable Examples: Tic-tac-toe, chess, checkers 7 Deterministic Games } Many possible formalizations, one is: } States: S (start at s0) } Players: P={1...N} (usually take turns) } Actions:A (may depend on player / state) } Transition Function: SxA ® S } Terminal Te s t : S ® {t,f} } Terminal Utilities: SxP ® R } Solution for a player is a policy: S ® A 8 Single-Agent Trees 8 2 0 … 2 6 … 4 6 9 Value of a State Value of a state: The Non-Terminal best achievable outcome States: (utility) from that state 8 2 0 … 2 6 … 4 6 Terminal States: 10 Adversarial Search 11 Adversarial Game Trees -20 -8 … -18 -5 … -10 +4 -20 +8 12 Minimax Values States Under AGent’s Control: States Under Opponent’s Control: -8 -5 -10 +8 Terminal States: 13 Tic-Tac-Toe Game Tree 14 Game tree (tic-tac-toe) } Two players: �$ and �% (�$ is now searching to find a good move) } Zero-sum games: �$ gets �(�), �% gets � − �(�) for terminal node � �$: � �$ �%: � 1-ply = half move �% �$ �% Utilities from the point of view of �$ 15 Optimal play } Opponent is assumed optimal } Minimax function is used to find the utility of each state. } MAX/MIN wants to maximize/minimize the terminal payoff MAX gets �(�) for terminal node � 16 Adversarial Search (Minimax) } Minimax search: } A state-space search tree } Players alternate turns 5 max } Compute each node’s minimax value: the best achievable utility against a 2 5 min rational (optimal) adversary 8 2 5 6 Terminal values: part of the game 17 Minimax ��(�, ��) �� _��(�) �� = 5��D∈FGHIJKL M ��(��(�, �)) �� = �� D∈FGHIJKL M ��(�� , � ) �� = �� Utility of being in state s } ��(�) shows the best achievable outcome of being in state � (assumption: optimal opponent) 3 3 2 2 18 Minimax (Cont.) } Optimal strategy: move to the state with highest minimax value } Best achievable payoff against best play } Maximizes the worst-case outcome for MAX } It works for zero-sum games 19 Minimax Properties max min 10 10 9 100 Optimal against a perfect player. Otherwise? 20 Minimax Implementation def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min-value(successor)) v = min(v, max-value(successor)) return v return v 21 Minimax Implementation (Dispatch) def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v 22 Minimax algorithm Depth first search function ��_��(��) returns �� return max ��_��(��(��, �)) D∈FGHIJKL(MVDVW) function ��_��(��) returns � �� if ��_��(��) then return ��(��) � ← −∞ for each � in ��(��) do � ← ��(�, ��_��(��(��, �))) return � function ��_��(��) returns � �� if ��_��(��) then return ��(��) � ← ∞ for each � in ��(��) do � ← ��(�, ��_��(��(��, �))) return � 23 Properties of minimax } Complete?Yes (when tree is finite) } Optimal?Yes (against an optimal opponent) } Time complexity: �(��) } Space complexity: �(��) (depth-first exploration) } For chess, � ≈ 35, � > 50 for reasonable games } Finding exact solution is completely infeasible 24 Game Tree Pruning 25 Pruning } Correct minimax decision without looking at every node in the game tree } α-β pruning } Branch & bound algorithm } Prunes away branches that cannot influence the final decision 26 α-β pruning example 27 α-β pruning example 28 α-β pruning example 29 α-β pruning example 30 α-β pruning example 31 α-β pruning } Assuming depth-first generation of tree } We prune node � when player has a better choice � at (parent or) any ancestor of � } Tw o types of pruning (cuts): } pruning of max nodes (α-cuts) } pruning of min nodes (β-cuts) 32 Alpha-Beta Pruning } General configuration (MIN version) } We’re computing the MIN-VALUE at some node n MAX } We’re looping over n’s children } Who cares about n’s value? MAX MIN a } Let a be the best value that MAX can get at any choice point along the current path from the root } If n becomes worse than a, MAX will avoid it, so we MAX can stop considering n’s other children (it’s already bad enough that it won’t be played) MIN n } MAX version is symmetric 33 α-β pruning (an other example) 3 3 ≤2 2 ≥5 1 5 34 Why is it called α-β? } α: Value of the best (highest) choice found so far at any choice point along the path for MAX } �: Value of the best (lowest) choice found so far at any choice point along the path for MIN } Updating α and � during the search process } For a MAX node once the value of this node is known to be more than the current � (v ≥ �), its remaining branches are pruned. } For a MIN node once the value of this node is known to be less than the current � (v ≤ �), its remaining branches are pruned. 35 Alpha-Beta Implementation α: MAX’s best option on path to root β: MIN’s best option on path to root def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β)) if v ≥ β return v if v ≤ α return v α = max(α, v) β = min(β, v) return v return v 36 function ��_��_��(��) returns �� ← ��_��(��, −∞, +∞) return the �� in ��(��) with value � function ��_��(��, �, �) returns � �� if ��_��(��) then return ��(��) � ← −∞ for each � in ��(��) do � ← ��(�, ��_��(��(��, �), �, �)) if � ≥ � then return � � ← ��(�, �) return � function ��_��(��, �, �) returns � �� if ��_��(��) then return ��(��) � ← +∞ for each � in ��(��) do � ← ��(�, ��_��(��(��, �), �, �)) if � ≤ � then return � � ← ��(�, �) return � 37 α-β progress 38 Order of moves } Good move ordering improves effectiveness of pruning ? l } Best order: time complexity is �(� ⁄m) 39 Computational time limit (example) } 100 secs is allowed for each move (game rule) } 104 nodes/sec (processor speed) } We can explore just 106 nodes for each move } bm = 106, b=35 ⟹ m=4 } (4-ply look-ahead is a hopeless chess player!) } a-b reaches about depth 8 – decent chess program } It is still hopeless 40 Resource Limits 41 Resource Limits } Problem: In realistic games, cannot search to leaves! 4 max -2 4 min } Solution: Depth-limited search -1 -2 4 9 } Instead, search only to a limited depth in the tree } Replace terminal utilities with an evaluation function for non-terminal positions } Guarantee of optimal play is gone } More plies makes a BIG difference } Use iterative deepening for an anytime ? ? ? ? algorithm 42 Computational time limit: Solution } We must make a decision even when finding the optimal move is infeasible. } Cut off the search and apply a heuristic evaluation function } cutoff test: turns non-terminal nodes into terminal leaves } Cut off test instead of terminal test (e.g., depth limit) } evaluation function: estimated desirability of a state } Heuristic function evaluation instead of utility function } This approach does not guarantee optimality. 43 Depth Matters } Evaluation functions are always imperfect } The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters } An important example of the tradeoff between complexity of features and complexity of computation 44 Heuristic minimax �pIKIpFq M,r = ��(�, ��) �� _��(�, �) 5��D∈FGHIJKL M �_��(�� , � , � + 1) �� = MAx ��D∈FGHIJKL M �_��(�� , � , � + 1) �� = MIN 45 Evaluation Functions 46 Evaluation functions } For terminal states, it should order them in the same way as the true utility

Adversarial Search (Game)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support