Adversarial Search (Game)
Total Page:16
File Type:pdf, Size:1020Kb
Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani “Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline } Game as a search problem } Minimax algorithm } �-� Pruning: ignoring a portion of the search tree } Time limit problem } Cut off & Evaluation function 2 Games as search problems } Games } Adversarial search problems (goals are in conflict) } Competitive multi-agent environments } Games in AI are a specialized kind of games (in the game theory) 3 Adversarial Games 4 Types of Games } Many different kinds of games! } Axes: } Deterministic or stochastic? } One, two, or more players? } Zero sum? } Perfect information (can you see the state)? } Want algorithms for calculating a strategy (policy) which recommends a move from each state 5 Zero-Sum Games } Zero-Sum Games } General Games } Agents have opposite utilities } Agents have independent utilities (values on outcomes) (values on outcomes) } Lets us think of a single value that one } Cooperation, indifference, competition, maximizes and the other minimizes and more are all possible } Adversarial, pure competition } More later on non-zero-sum games 6 Primary assumptions } We start with these games: } Two -player } Tu r n taking } agents act alternately } Zero-sum } agents’ goals are in conflict: sum of utility values at the end of the game is zero or constant } Perfect information } fully observable Examples: Tic-tac-toe, chess, checkers 7 Deterministic Games } Many possible formalizations, one is: } States: S (start at s0) } Players: P={1...N} (usually take turns) } Actions:A (may depend on player / state) } Transition Function: SxA ® S } Terminal Te s t : S ® {t,f} } Terminal Utilities: SxP ® R } Solution for a player is a policy: S ® A 8 Single-Agent Trees 8 2 0 … 2 6 … 4 6 9 Value of a State Value of a state: The Non-Terminal best achievable outcome States: (utility) from that state 8 2 0 … 2 6 … 4 6 Terminal States: 10 Adversarial Search 11 Adversarial Game Trees -20 -8 … -18 -5 … -10 +4 -20 +8 12 Minimax Values States Under AGent’s Control: States Under Opponent’s Control: -8 -5 -10 +8 Terminal States: 13 Tic-Tac-Toe Game Tree 14 Game tree (tic-tac-toe) } Two players: �$ and �% (�$ is now searching to find a good move) } Zero-sum games: �$ gets �(�), �% gets � − �(�) for terminal node � �$: � �$ �%: � 1-ply = half move �% �$ �% Utilities from the point of view of �$ 15 Optimal play } Opponent is assumed optimal } Minimax function is used to find the utility of each state. } MAX/MIN wants to maximize/minimize the terminal payoff MAX gets �(�) for terminal node � 16 Adversarial Search (Minimax) } Minimax search: } A state-space search tree } Players alternate turns 5 max } Compute each node’s minimax value: the best achievable utility against a 2 5 min rational (optimal) adversary 8 2 5 6 Terminal values: part of the game 17 Minimax �������(�, ���) �� ��������_����(�) ������� � = 5���D∈FGHIJKL M �������(������(�, �)) ������ � = ��� ���D∈FGHIJKL M �������(������ �, � ) ������ � = ��� Utility of being in state s } �������(�) shows the best achievable outcome of being in state � (assumption: optimal opponent) 3 3 2 2 18 Minimax (Cont.) } Optimal strategy: move to the state with highest minimax value } Best achievable payoff against best play } Maximizes the worst-case outcome for MAX } It works for zero-sum games 19 Minimax Properties max min 10 10 9 100 Optimal against a perfect player. Otherwise? 20 Minimax Implementation def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min-value(successor)) v = min(v, max-value(successor)) return v return v 21 Minimax Implementation (Dispatch) def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v 22 Minimax algorithm Depth first search function �������_��������(�����) returns �� ������ return max ���_�����(������(�����, �)) D∈FGHIJKL(MVDVW) function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return � function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← ∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return � 23 Properties of minimax } Complete?Yes (when tree is finite) } Optimal?Yes (against an optimal opponent) } Time complexity: �(��) } Space complexity: �(��) (depth-first exploration) } For chess, � ≈ 35, � > 50 for reasonable games } Finding exact solution is completely infeasible 24 Game Tree Pruning 25 Pruning } Correct minimax decision without looking at every node in the game tree } α-β pruning } Branch & bound algorithm } Prunes away branches that cannot influence the final decision 26 α-β pruning example 27 α-β pruning example 28 α-β pruning example 29 α-β pruning example 30 α-β pruning example 31 α-β pruning } Assuming depth-first generation of tree } We prune node � when player has a better choice � at (parent or) any ancestor of � } Tw o types of pruning (cuts): } pruning of max nodes (α-cuts) } pruning of min nodes (β-cuts) 32 Alpha-Beta Pruning } General configuration (MIN version) } We’re computing the MIN-VALUE at some node n MAX } We’re looping over n’s children } Who cares about n’s value? MAX MIN a } Let a be the best value that MAX can get at any choice point along the current path from the root } If n becomes worse than a, MAX will avoid it, so we MAX can stop considering n’s other children (it’s already bad enough that it won’t be played) MIN n } MAX version is symmetric 33 α-β pruning (an other example) 3 3 ≤2 2 ≥5 1 5 34 Why is it called α-β? } α: Value of the best (highest) choice found so far at any choice point along the path for MAX } �: Value of the best (lowest) choice found so far at any choice point along the path for MIN } Updating α and � during the search process } For a MAX node once the value of this node is known to be more than the current � (v ≥ �), its remaining branches are pruned. } For a MIN node once the value of this node is known to be less than the current � (v ≤ �), its remaining branches are pruned. 35 Alpha-Beta Implementation α: MAX’s best option on path to root β: MIN’s best option on path to root def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β)) if v ≥ β return v if v ≤ α return v α = max(α, v) β = min(β, v) return v return v 36 function �����_����_������(�����) returns �� ������ � ← ���_�����(�����, −∞, +∞) return the ������ in �������(�����) with value � function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≥ � then return � � ← ���(�, �) return � function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← +∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≤ � then return � � ← ���(�, �) return � 37 α-β progress 38 Order of moves } Good move ordering improves effectiveness of pruning ? l } Best order: time complexity is �(� ⁄m) 39 Computational time limit (example) } 100 secs is allowed for each move (game rule) } 104 nodes/sec (processor speed) } We can explore just 106 nodes for each move } bm = 106, b=35 ⟹ m=4 } (4-ply look-ahead is a hopeless chess player!) } a-b reaches about depth 8 – decent chess program } It is still hopeless 40 Resource Limits 41 Resource Limits } Problem: In realistic games, cannot search to leaves! 4 max -2 4 min } Solution: Depth-limited search -1 -2 4 9 } Instead, search only to a limited depth in the tree } Replace terminal utilities with an evaluation function for non-terminal positions } Guarantee of optimal play is gone } More plies makes a BIG difference } Use iterative deepening for an anytime ? ? ? ? algorithm 42 Computational time limit: Solution } We must make a decision even when finding the optimal move is infeasible. } Cut off the search and apply a heuristic evaluation function } cutoff test: turns non-terminal nodes into terminal leaves } Cut off test instead of terminal test (e.g., depth limit) } evaluation function: estimated desirability of a state } Heuristic function evaluation instead of utility function } This approach does not guarantee optimality. 43 Depth Matters } Evaluation functions are always imperfect } The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters } An important example of the tradeoff between complexity of features and complexity of computation 44 Heuristic minimax �pIKIpFq M,r = ����(�, ���) �� ������_����(�, �) 5���D∈FGHIJKL M �_�������(������ �, � , � + 1) ������ � = MAx ���D∈FGHIJKL M �_�������(������ �, � , � + 1) ������ � = MIN 45 Evaluation Functions 46 Evaluation functions } For terminal states, it should order them in the same way as the true utility