Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019

Soleymani

“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline

} Game as a search problem } algorithm } �-� Pruning: ignoring a portion of the search tree } Time limit problem } Cut off & Evaluation function

2 Games as search problems

} Games } Adversarial search problems (goals are in conflict) } Competitive multi-agent environments

} Games in AI are a specialized kind of games (in the )

3 Adversarial Games

4 Types of Games

} Many different kinds of games!

} Axes: } Deterministic or stochastic? } One, two, or more players? } Zero sum? } (can you see the state)?

} Want algorithms for calculating a (policy) which recommends a move from each state

5 Zero-Sum Games

} Zero-Sum Games } General Games } Agents have opposite utilities } Agents have independent utilities (values on outcomes) (values on outcomes) } Lets us think of a single value that one } Cooperation, indifference, competition, maximizes and the other minimizes and more are all possible } Adversarial, pure competition } More later on non-zero-sum games

6 Primary assumptions

} We start with these games: } Two -player } Tu r n taking } agents act alternately } Zero-sum } agents’ goals are in conflict: sum of utility values at the end of the game is zero or constant } Perfect information } fully observable

Examples: Tic-tac-toe, , checkers

7 Deterministic Games

} Many possible formalizations, one is:

} States: S (start at s0) } Players: P={1...N} (usually take turns) } Actions:A (may depend on player / state) } Transition Function: SxA ® S } Terminal Te s t : S ® {t,f} } Terminal Utilities: SxP ® R

} Solution for a player is a policy: S ® A

8 Single-Agent Trees

8

2 0 … 2 6 … 4 6

9 Value of a State

Value of a state: The Non-Terminal best achievable outcome States: (utility) from that state

8

2 0 … 2 6 … 4 6 Terminal States:

10 Adversarial Search

11 Adversarial Game Trees

-20 -8 … -18 -5 … -10 +4 -20 +8

12 Minimax Values

States Under Agent’s Control: States Under Opponent’s Control:

-8 -5 -10 +8

Terminal States:

13 Tic-Tac-Toe Game Tree

14 Game tree (tic-tac-toe)

} Two players: � and � (� is now searching to find a good move)

} Zero-sum games: � gets �(�), � gets � − �(�) for terminal node �

�: � � �: � 1-ply = half move �

Utilities from the point of view of � 15 Optimal play

} Opponent is assumed optimal } Minimax function is used to find the utility of each state. } MAX/MIN wants to maximize/minimize the terminal payoff

MAX gets �(�) for terminal node � 16 Adversarial Search (Minimax)

} Minimax search: } A state-space search tree } Players alternate turns 5 max } Compute each node’s minimax value: the best achievable utility against a 2 5 min rational (optimal) adversary

8 2 5 6 Terminal values: part of the game

17 Minimax

�������(�, ���) �� ��������_����(�) ������� � = ���∈ �������(������(�, �)) ������ � = ��� ���∈ �������(������ �, � ) ������ � = ��� Utility of being in state s

} �������(�) shows the best achievable outcome of being in state � (assumption: optimal opponent)

3

3 2 2

18 Minimax (Cont.)

} Optimal strategy: move to the state with highest minimax value } Best achievable payoff against best play } Maximizes the worst-case outcome for MAX } It works for zero-sum games

19 Minimax Properties

max

min

10 10 9 100

Optimal against a perfect player. Otherwise?

20 Minimax Implementation

def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min-value(successor)) v = min(v, max-value(successor)) return v return v

21 Minimax Implementation (Dispatch)

def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state)

def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v

22 Minimax algorithm Depth first search function �������_��������(�����) returns �� ������ return max ���_�����(������(�����, �)) ∈()

function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return �

function ���_�����(�����) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← ∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �))) return �

23 Properties of minimax

} Complete?Yes (when tree is finite)

} Optimal?Yes (against an optimal opponent)

} Time complexity: �(��)

} Space complexity: �(��) (depth-first exploration)

} For chess, � ≈ 35, � > 50 for reasonable games } Finding exact solution is completely infeasible

24 Game Tree Pruning

25 Pruning

} Correct minimax decision without looking at every node in the game tree } α-β pruning } Branch & bound algorithm } Prunes away branches that cannot influence the final decision

26 α-β pruning example

27 α-β pruning example

28 α-β pruning example

29 α-β pruning example

30 α-β pruning example

31 α-β pruning

} Assuming depth-first generation of tree } We prune node � when player has a better choice � at (parent or) any ancestor of �

} Tw o types of pruning (cuts): } pruning of max nodes (α-cuts) } pruning of min nodes (β-cuts)

32 Alpha-Beta Pruning

} General configuration (MIN version) } We’re computing the MIN-VALUE at some node n MAX } We’re looping over n’s children } Who cares about n’s value? MAX MIN a } Let a be the best value that MAX can get at any choice point along the current path from the root } If n becomes worse than a, MAX will avoid it, so we MAX can stop considering n’s other children (it’s already bad enough that it won’t be played) MIN n

} MAX version is symmetric

33 α-β pruning (an other example)

3

3 ≤2 2

≥5

1 5

34 Why is it called α-β?

} α: Value of the best (highest) choice found so far at any choice point along the path for MAX } �: Value of the best (lowest) choice found so far at any choice point along the path for MIN

} Updating α and � during the search process

} For a MAX node once the value of this node is known to be more than the current � (v ≥ �), its remaining branches are pruned. } For a MIN node once the value of this node is known to be less than the current � (v ≤ �), its remaining branches are pruned.

35 Alpha-Beta Implementation

α: MAX’s best option on path to root β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β)) if v ≥ β return v if v ≤ α return v α = max(α, v) β = min(β, v) return v return v

36 function �����_����_������(�����) returns �� ������ � ← ���_�����(�����, −∞, +∞) return the ������ in �������(�����) with value �

function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← −∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≥ � then return � � ← ���(�, �) return �

function ���_�����(�����, �, �) returns � ������� ����� if ��������_����(�����) then return �������(�����) � ← +∞ for each � in �������(�����) do � ← ���(�, ���_�����(�������(�����, �), �, �)) if � ≤ � then return � � ← ���(�, �) return �

37 α-β progress

38 Order of moves } Good move ordering improves effectiveness of pruning

?

} Best order: time complexity is �(� ⁄)

39 Computational time limit (example)

} 100 secs is allowed for each move (game rule) } 104 nodes/sec (processor speed) } We can explore just 106 nodes for each move } bm = 106, b=35 ⟹ m=4 } (4-ply look-ahead is a hopeless chess player!) } a-b reaches about depth 8 – decent chess program } It is still hopeless

40 Resource Limits

41 Resource Limits

} Problem: In realistic games, cannot search to leaves! 4 max -2 4 min } Solution: Depth-limited search -1 -2 4 9 } Instead, search only to a limited depth in the tree } Replace terminal utilities with an evaluation function for non-terminal positions

} Guarantee of optimal play is gone

} More plies makes a BIG difference

} Use iterative deepening for an anytime ? ? ? ? algorithm

42 Computational time limit: Solution

} We must make a decision even when finding the optimal move is infeasible.

} Cut off the search and apply a heuristic evaluation function } cutoff test: turns non-terminal nodes into terminal leaves } Cut off test instead of terminal test (e.g., depth limit) } evaluation function: estimated desirability of a state } Heuristic function evaluation instead of utility function

} This approach does not guarantee optimality.

43 Depth Matters

} Evaluation functions are always imperfect

} The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters

} An important example of the tradeoff between complexity of features and complexity of computation

44 Heuristic minimax

� , =

����(�, ���) �� ������_����(�, �) ���∈ �_�������(������ �, � , � + 1) ������ � = MAX ���∈ �_�������(������ �, � , � + 1) ������ � = MIN

45 Evaluation Functions

46 Evaluation functions

} For terminal states, it should order them in the same way as the true utility function. } For non-terminal states, it should be strongly correlated with the actual chances of winning. } It must not need high computational cost.

47 Evaluation functions based on features

} Example: features for evaluation of the chess states } Number of each kind of piece: number of white pawns, black pawns, white queens, black queens, etc } King safety } Good pawn structure } …

48 Evaluation Functions

} Evaluation functions score non-terminals in depth-limited search

} Ideal function: returns the actual minimax value of the position } In practice: typically weighted linear sum of features:

} e.g. f1(s) = (num white queens – num black queens), etc.

49 Cutting off search: simple depth limit

} Problem1: non-quiescent positions } Few more plies make big difference in evaluation value

} Problem 2: horizon effect } Delaying tactics against opponent’s move that causes serious unavoidable damage (because of pushing the damage beyond the horizon that the player can see, i.e. cut-off threshold)

50 Speed up the search process

} Table lookup rather than search for some states } E.g., for the opening and ending of games (where there are few choices)

} Example: Chess } For each opening, the best advice of human experts (from books describing good plays) can be copied into tables. } For endgame, computer analysis is usually used (solving endgames by computer).

51 Uncertain Outcomes

54 Probabilities

55 Reminder: Probabilities

} A random variable represents an event whose outcome is unknown } A probability distribution is an assignment of weights to outcomes

} Example:Tr af f ic on freeway } Random variable:T = whether there’s traffic } Outcomes:T in {none, light, heavy} 0.25 } Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

} Some laws of probability (more later): } Probabilities are always non-negative } Probabilities over all possible outcomes sum to one 0.50

} As we get more evidence, probabilities may change: } P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60 } We’ll talk about methods for reasoning and updating probabilities later 0.25

56 Reminder: Expectations

} The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes

} Example: How long to get to the airport?

Time: 20 min 30 min 60 min x + x + x 35 Probability: 0.25 0.50 0.25 min

57 Mixed Layer Types

} E.g. Backgammon } } Environment is an extra “random agent” player that moves after each min/max agent } Each node computes the appropriate combination of its children

58 Stochastic games: Backgammon

59 Stochastic games

} Expected utility: Chance nodes take average (expectation) over all possible outcomes. } It is consistent with the definition of rational agents trying to maximize expected utility.

average of the values 2.1 1.3 weighted by their probabilities

2 3 1 4

60 Stochastic games

������_�������(�) = �������(�, ���) �� ��������_����(�) ���∈ ������_�������(������(�, �)) ������ � = MAX ���∈ ������_�������(������(�, �)) ������ � = MIN �(�) ������_�������(������ �, � ) ������ � = CHANCE

61 What Utilities to Use?

x 0 40 20 30 2 0 1600 400 900

} For worst-case minimax reasoning, terminal function scale doesn’t matter } We just want better states to have higher evaluations (get the ordering right) } We call this insensitivity to monotonic transformations

} For average-case expectimax reasoning, we need magnitudes to be meaningful

62 Evaluation functions for stochastic games } An order preserving transformation on leaf values is not a sufficient condition.

} Evaluation function must be a positive linear transformation of the expected utility of a position.

63 Properties of search space for stochastic games

} �(��) } Backgammon: � ≈ 20 (can be up to 4000 for double dice rolls), � = 21 (no. of different possible dice rolls) } 3-plies is manageable (≈ 10 nodes)

} Probability of reaching a given node decreases enormously by increasing the depth (multiplying probabilities) } Forming detailed plans of actions may be pointless } Limiting depth is not such damaging particularly when the probability values (for each non-deterministic situation) are close to each other } But pruning is not straightforward.

64 Example: Backgammon

} Dice rolls increase b: 21 possible rolls with 2 dice } Backgammon » 20 legal moves } Depth 2 = 20 x (21 x 20)3 = 1.2 x 109

} As depth increases, probability of reaching a given search node shrinks } So usefulness of search is diminished } So limiting depth is less damaging Image: Wikipedia } But pruning is trickier…

} Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning

1st AI world champion in any game!

65 Other Game Types

66 Multi-Agent Utilities

} What if the game is not zero-sum, or has multiple players?

} Generalization of minimax: } Te r m i n a l s have utility tuples } Node values are also utility tuples } Each player maximizes its own component } Can give rise to cooperation and competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

67 State-of-the-art game programs } Chess (� ≈ 35) } In 1997, Deep Blue defeated Kasparov. } ran on a parallel computer doing alpha-beta search. } reaches depth 14 plies routinely. ¨ techniques to extend the effective search depth } Hydra: Reaches depth 18 plies using more heuristics.

} Checkers (� < 10) } Chinook (ran on a regular PC and uses alpha-beta search) ended 40- year-reign of human world champion Tinsley in 1994. } Since 2007, Chinook has been able to play perfectly by using alpha-beta search combined with a database of 39 trillion endgame positions.

68 State-of-the-art game programs (Cont.) } Othello (� is usually between 5 to 15) } Logistello defeated the human world champion by six games to none in 1997. } Human champions are no match for computers at Othello.

} Go (� > 300) } Human champions refuse to compete against computers (current programs are still at advanced amateur level). } MOGO avoided alpha-beta search and used Monte Carlo rollouts. } AlphaGo (2016) has beaten professionals without handicaps.

69 State-of-the-art game programs (Cont.) } Backgammon (stochastic) } TD-Gammon (1992) was competitive with top human players. } Depth 2 or 3 search along with a good evaluation function developed by learning methods

} Bridge (partially observable, multiplayer) } In 1998, GIB was 12th in a filed of 35 in the par contest at human world championship. } In 2005, Jack defeated three out of seven top champions pairs. Overall, it lost by a small margin.

} Scrabble (partially observable & stochastic) } In 2006, Quackle defeated the former world champion 3-2.

70