<<

INFOB2KI 2019-2020 Utrecht University The Netherlands

ARTIFICIAL INTELLIGENCE

Decision making: opponent based

Lecturer: Silja Renooij

These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html theory

2 Outline . : rules for defeating opponents – • Deterministic turn‐taking games – Mini‐max algorithm, Alpha‐beta pruning – , – Imperfect information games • Simultaneous‐move games: Mixed – Incomplete information games • Prisoner’s dilemma

3 Game Theory Developed to explain the optimal strategy in two‐person (nowadays n ≥ 2) interactions.

. Initially, von Neumann and Morgenstern – Zero‐sum games (your win == opponents loss) . John Nash – Nonzero‐sum games . Harsanyi, Selten – Incomplete information (Bayesian games)

4 Zero-sum games

Better term: constant‐sum game

Examples of zero‐sum games: . 2‐player; payoffs: 1 for win, ‐1 for loss, 0 for draw  total payoff is 0, regardless of the . 2‐player; payoffs: 1 for win, 0 for loss, ½ for draw  total payoff is 1, regadless of outcome . 3‐player; payoffs: distribute 3 points over players, depending on performance

Example of non‐zero‐sum game: . 2‐player; payoffs: 3 for win, 0 for loss, 1 for draw

 total payoff is either 3 or 2, depending on the outcome. 5 Game types games: . Perfect information games upon making a move, players know full history of the game, all moves by all players, all payoffs, etc

. Imperfect information games players know all outcomes/payoffs, types of other players & their strategies, but are unaware of (or unsure about) possible actions of other playes ‐ simultaneous moves: what action will others choose? ‐ (temporarily) shielded attributes: who has which cards?

Complete information games can be deterministic or involve chance. 6 Game types

Incomplete information games:

Uncertainty about game being played: factors outside the rules of the game, not known to one or more players, may affect the outcome of the game

• E.g players may not know other players "type", their strategies, payoffs or preferences

Incomplete information games can be deterministic or involve chance.

7 Complete information games

Deterministic Chance

Perfect , checkers, Backgammon, information go, othello, monopoly Tic‐tac‐toe Imperfect Battleships, Bridge, poker, information Minesweeper scrabble

NB(!) textbook says randomness is the difference between perfect and imperfect. Other sources state imperfect == incomplete …. be aware of this! 8 Deterministic Two-player, turn-taking

Perfect information

9 • alternates between two players. • represents all possibilities from perspective of one player (‘me’) from current root.

me:

you:

me:

you: Zero‐sum or Non‐zero sum?

(my rewards) 10 . Compute value of perfect play for deterministic, perfect information games – Traverse game tree in DFS‐like manner – ‘bubble up’ values of evaluation function: maximise if my turn, minimise for opponent’s turn

. Serves for selecting next move: choose move to position with highest minimax value = best achievable payoff against best play

. May also serve for finding optimal strategy = from start to finish my best move, for every move of opponent.

11 Minimax: example

NB book uses circles and squares instead of triangles! 12 Minimax algorithm

13 Properties of minimax . Complete? Yes (if tree is finite)

. Optimal? Yes (against an optimal opponent)

. Time complexity? O(bm) for branching factor b and max depth m (in general, worst case, we cannot improve on this, so O(bm) suggested in textbook AI4G must be a typo)

. Space complexity? O(bm) (if depth‐first exploration; can be reduced to O(m) with backtracking variant of DFS which generates one successor at a time, rather than all )

For chess, with b ≈ 35, m ≈100 for "reasonable" games  exact solution completely infeasible

14 Solution: pruning

m

n

. Idea: If m is better for me than n, we will never actually get to n in play (MAX will avoid it)  prune that leaf/subtree . let α be the best (= highest) value (to MAX) found so far on current path . define β similarly for MIN: best (= lowest) value found so far

15 Alpha-Beta (α-β) pruning

Minimax, augmented with upper‐ and lowerbounds • Init: for all non‐leaf nodes set α = −∞ lowerbound on achievable score β= ∞ upperbound on achievable score

• Upwards: update α in MAX move; update β in MIN move α = −∞, β = ∞

α = −∞, β = ∞

16 α-β Pruning

function MIN-VALUE is similarly extended: if then return v β ← MIN(β, v)

17 α-β pruning example

α= 3 β= ∞

α= −∞ β= 3

α = best score till now (lowerbound )  updated in own (MAX) move

β= upperbound on achievable score  updated in opponents (MIN) move

18 α-β pruning example

α= 3 β= ∞

α= −∞ α= −∞ β= 3 β= 2

19 α-β pruning example

α= 3 β= ∞

α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 14

Prune or continue?

20 α-β pruning example

α= 3 β= ∞

α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 5

Prune or continue?

21 α-β pruning example

α= 3 β= ∞ α= −∞ β= 2 α= −∞ α= −∞ β= 3 β= 2

22 Properties of α-β

. Pruning does not affect final result! . Good move ordering improves effectiveness of pruning . With “perfect ordering”, time complexity = O(bm/2)  effective branching factor is √b  allows search depth to double for same cost

. A simple example of the value of reasoning about which computations are relevant (a form of meta‐reasoning)

23 Practical feasibility: resource limits

Suppose we have 100 secs and explore 104 nodes/sec  106 nodes can be explored per move

What if we have too little time to reach terminal states (=utility function)? Standard approach combines: . cutoff test:

e.g., depth limit (perhaps add quiescence search: disregard positions that are unlikely to exhibit wild swings in value in near future) . evaluation function = estimated desirability of position

24 Evaluation functions

Evaluation function (cf heuristic with A*): . Returns estimate of expected utility of the game from a given position . Must agree with utility function on terminal nodes . Must not take too long

For chess, typically linear weighted sum of features

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) e.g., w1 = 9 with

f1(s) = (# white queens) – (# black queens), etc.

25 Cutting off search

MinimaxCutoff is identical to MinimaxValue except:

1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval

Does it work in practice? bm = 106, b=35  m=4

4‐ply lookahead is a hopeless chess player!

– 4‐ply ≈ human novice – 8‐ply ≈ typical PC, human master – 12‐ply ≈ Deep Blue, Kasparov

26 Deterministic games in practice

. Checkers: Chinook ended 40‐year‐reign of human world champion Marion Tinsley in 1994. Used a pre‐computed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.

. Chess: Deep Blue defeated human world champion Garry Kasparov in a six‐game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

. Othello: human champions refuse to compete against computers, who are too good.

. Go: human champions refuse to compete against computers, who are too bad. In Go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. 2016/2017: Alpha Go beats world’s number 1 and 2 using deep learning.

27 Strategies and equilibria

28 Big Monkey, Little Monkey example

. Monkeys usually eat ground‐level fruit . Occasionally they climb a tree to shake loose a coconut (1 per tree) . A Coconut yields 10 Calories . Big Monkey expends 2 Calories climbing up the tree, shaking and climbing down. . Little Monkey expends 0 Calories on this exercise.

29 BM and LM utilities

. If only BM climbs the tree LM eats some before BM gets down BM gets 6 C, LM gets 4 C . If only LM climbs the tree BM eats almost all before LM gets down BM gets 9 C, LM gets 1 C . If both climb the tree BM is first to hog coconut BM gets 7 C, LM gets 3 C

How should the monkeys each act so as to maximize their own calorie gain?

30 BM and LM: strategies

Strategies are determined prior to ‘playing the game’

Assume BM will be allowed to move first.

BM has two (single action) strategies: – wait (w), or – climb (c)

LM has four strategies: – If BM waits, then wait; if BM climbs then wait (xw) – If BM waits, then wait; if BM climbs then climb (xx) – If BM waits, then climb; if BM climbs then wait (x¬x) – If BM waits, then climb; if BM climbs then climb (xc) 31 BM and LM: BM moves first

Big monkey w c

Little monkey w c w c

0,0 9,1 6-2,4 7-2,3 (BM,LM) What should Big Monkey do? If BM waits, will outcome be at least that of climbing, regardless of what LM does? No: 0 vs 4, 9 vs 5 …. What if we believe LM will act rationally? 32 BM and LM: BM moves first

Big monkey w c

Little monkey w c w c

0,0 9,1 6-2,4 7-2,3 (BM,LM)

What should Big Monkey do? • If BM waits, LM will climb  BM gets 9 • If BM climbs, LM will wait  BM gets 4  BM should wait (w) What about Little Monkey?  Opposite of BM (x¬x) (eventhough we’ll never get to the right side of the game tree unless BM errs)

33 BM and LM: BM moves first

The game‐tree representation of a game is called extensive form1, as opposed to normal form2 : LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1

What should BM do? What about Little Monkey?  wait (w)  Opposite of BM (x¬x)

1 game tree that explicitly shows the players´ moves and resulting payoffs 2 table showing payoffs of outcomes of simultaneous ‘decisions’ (strategies) 34 Dominant strategies

Consider a player’s strategies s1 and s2. If, regardless of the other players’ strategy:

• payoff for s1 ≥ payoff for s2 then s1 (weakly) dominates s2 • payoff for s1 > payoff for s2 then s1 strictly dominates s2 A player has a dominant strategy s if s dominates all the player’s other strategies. For Little Monkey x¬x is a weakly dominant strategy; BM does not have a dominant strategy: LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1 35 BM and LM: LM moves first

Little monkey w c

Big monkey w c w c

0,0 4,4 1,9 3,5 (LM,BM) What should Little Monkey do? • If LM waits, BM will climb  LM gets 4 • If LM climbs, BM will wait  LM gets 1  LM should wait (w) What about Big Monkey?  Opposite of LM (x¬x)

36 Responses and equilibria

. Strategies w and x¬x are called best responses: – given what the other player does, this is the best thing to do. . A solution where everyone is playing a best response is called a Nash equilibrium. – No one can unilaterally change and improve things. . Every finite1 game has a Nash equilibrium – but not necessarily in terms of pure strategies!

1 finite in #players and #pure strategies; pure = not mixed (see imperfect information games) 37 BM and LM: equilibria

For each strategy of one player there is a best response of the other  multiple Nash equilibria

LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1

BM moves first  the following Nash equilibria (BM, LM): .(w, x¬x); (w, xc); (c, xw)

Why isn’t (c, x¬x) a Nash equilibrium?

What if the monkeys have to move simultaneously?

38 Imperfect information

♥♣♠♦

39 BM and LM move together LM w c Big monkey ? c w c c 5,3 4,4 Little monkey wwc BM 0,0 9,1 4,4 5,3 w 9,1 0,0

LM/BM has to choose before he sees BM/LM move…. two obvious Nash equilibria: (c,w), (w,c) A third Nash equilibrium, if both use a mixed strategy: “choose between c & w with p=0.5”  each outcome has p=0.25  Expected payoff (BM,LM) = (4.5, 2)

40 Choosing Strategies

. A strategy is optimal if no other strategy/outcome is preferred by all players . In zero‐sum games a pure strategy can be optimal; in non‐zero‐sum games a mixed strategy is required . In the , it’s harder to see what each monkey should do – Mixed strategy is optimal . Often, other techniques can be used to prune the number of possible actions: – E.g. using dominance

41 Incomplete information

? ?

42 Prisoner’s Dilemma

Each player can cooperate or defect

Carl

Rob,Carl cooperate defect

Rob cooperate -1,-1 -10,0

defect 0,-10 -8,-8

43 Prisoner’s Dilemma Each player can cooperate or defect Carl

Rob,Carl cooperate defect

Rob cooperate -1,-1 -10,0

defect 0,-10 -8,-8

Defecting is a (strictly) dominant strategy for Rob

44 Prisoner’s Dilemma Each player can cooperate or defect Carl

Rob,Carl cooperate defect

Rob cooperate -1,-1 -10,0

defect 0,-10 -8,-8

Defecting is also a dominant strategy for Carl  Result is not optimal! 45 Prisoner’s Dilemma

. Even though both players would be better off cooperating, mutual defection is the dominant strategy…

. What drives this? – One‐shot game – Inability to trust your opponent (incomplete information: is your opponent selfish or nice?) – Perfect rationality

46 Summary

. (“Board”) games are fun to work on and illustrate several important points about AI

. perfection is unattainable  must approximate

. good idea to think about what to think about

47