Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019
Soleymani
“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline
} Game as a search problem } Minimax algorithm } �-� Pruning: ignoring a portion of the search tree } Time limit problem } Cut off & Evaluation function
2 Games as search problems
} Games } Adversarial search problems (goals are in conflict) } Competitive multi-agent environments
} Games in AI are a specialized kind of games (in the game theory)
3 Adversarial Games
4 Types of Games
} Many different kinds of games!
} Axes: } Deterministic or stochastic? } One, two, or more players? } Zero sum? } Perfect information (can you see the state)?
} Want algorithms for calculating a strategy (policy) which recommends a move from each state
5 Zero-Sum Games
} Zero-Sum Games } General Games } Agents have opposite utilities } Agents have independent utilities (values on outcomes) (values on outcomes) } Lets us think of a single value that one } Cooperation, indifference, competition, maximizes and the other minimizes and more are all possible } Adversarial, pure competition } More later on non-zero-sum games
6 Primary assumptions
} We start with these games: } Two -player } Tu r n taking } agents act alternately } Zero-sum } agents’ goals are in conflict: sum of utility values at the end of the game is zero or constant } Perfect information } fully observable
Examples: Tic-tac-toe, chess, checkers
7 Deterministic Games
} Many possible formalizations, one is:
} States: S (start at s0) } Players: P={1...N} (usually take turns) } Actions:A (may depend on player / state) } Transition Function: SxA ® S } Terminal Te s t : S ® {t,f} } Terminal Utilities: SxP ® R
} Solution for a player is a policy: S ® A
8 Single-Agent Trees
8
2 0 … 2 6 … 4 6
9 Value of a State
Value of a state: The Non-Terminal best achievable outcome States: (utility) from that state
8
2 0 … 2 6 … 4 6 Terminal States:
10 Adversarial Search
11 Adversarial Game Trees
-20 -8 … -18 -5 … -10 +4 -20 +8
12 Minimax Values
States Under Agent’s Control: States Under Opponent’s Control:
-8 -5 -10 +8
Terminal States:
13 Tic-Tac-Toe Game Tree
14 Game tree (tic-tac-toe)
} Two players: � and � (� is now searching to find a good move)
} Zero-sum games: � gets �(�), � gets � − �(�) for terminal node �
� : � � � : � 1-ply = half move �
�
�
Utilities from the point of view of � 15 Optimal play
} Opponent is assumed optimal } Minimax function is used to find the utility of each state. } MAX/MIN wants to maximize/minimize the terminal payoff
MAX gets �(�) for terminal node � 16 Adversarial Search (Minimax)
} Minimax search: } A state-space search tree } Players alternate turns 5 max } Compute each node’s minimax value: the best achievable utility against a 2 5 min rational (optimal) adversary
8 2 5 6 Terminal values: part of the game
17 Minimax
�������(�, ���) �� ��������_����(�) ������� � = ��� ∈ �������(������(�, �)) ������ � = ��� ��� ∈ �������(������ �, � ) ������ � = ��� Utility of being in state s
} �������(�) shows the best achievable outcome of being in state � (assumption: optimal opponent)
3
3 2 2
18 Minimax (Cont.)
} Optimal strategy: move to the state with highest minimax value } Best achievable payoff against best play } Maximizes the worst-case outcome for MAX } It works for zero-sum games
19 Minimax Properties
max
min
10 10 9 100
Optimal against a perfect player. Otherwise?
20 Minimax Implementation
def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min-value(successor)) v = min(v, max-value(successor)) return v return v
21 Minimax Implementation (Dispatch)
def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state)
def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v
22 Minimax algorithm Depth first search function �������_��������(�����) returns �� ������ return max ���_�����(������(�����, �)) ∈ ( )
function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return �
function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← ∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return �
23 Properties of minimax
} Complete?Yes (when tree is finite)
} Optimal?Yes (against an optimal opponent)
} Time complexity: �(��)
} Space complexity: �(��) (depth-first exploration)
} For chess, � ≈ 35, � > 50 for reasonable games } Finding exact solution is completely infeasible
24 Game Tree Pruning
25 Pruning
} Correct minimax decision without looking at every node in the game tree } α-β pruning } Branch & bound algorithm } Prunes away branches that cannot influence the final decision
26 α-β pruning example
27 α-β pruning example
28 α-β pruning example
29 α-β pruning example
30 α-β pruning example
31 α-β pruning
} Assuming depth-first generation of tree } We prune node � when player has a better choice � at (parent or) any ancestor of �
} Tw o types of pruning (cuts): } pruning of max nodes (α-cuts) } pruning of min nodes (β-cuts)
32 Alpha-Beta Pruning
} General configuration (MIN version) } We’re computing the MIN-VALUE at some node n MAX } We’re looping over n’s children } Who cares about n’s value? MAX MIN a } Let a be the best value that MAX can get at any choice point along the current path from the root } If n becomes worse than a, MAX will avoid it, so we MAX can stop considering n’s other children (it’s already bad enough that it won’t be played) MIN n
} MAX version is symmetric
33 α-β pruning (an other example)
3
3 ≤2 2
≥5
1 5
34 Why is it called α-β?
} α: Value of the best (highest) choice found so far at any choice point along the path for MAX } �: Value of the best (lowest) choice found so far at any choice point along the path for MIN
} Updating α and � during the search process
} For a MAX node once the value of this node is known to be more than the current � (v ≥ �), its remaining branches are pruned. } For a MIN node once the value of this node is known to be less than the current � (v ≤ �), its remaining branches are pruned.
35 Alpha-Beta Implementation
α: MAX’s best option on path to root β: MIN’s best option on path to root
def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β)) if v ≥ β return v if v ≤ α return v α = max(α, v) β = min(β, v) return v return v
36 function �����_����_������(�����) returns �� ������ � ← ���_�����(�����, −∞, +∞) return the ������ in �������(�����) with value �
function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≥ � then return � � ← ���(�, �) return �
function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← +∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≤ � then return � � ← ���(�, �) return �
37 α-β progress
38 Order of moves } Good move ordering improves effectiveness of pruning
?