INFOB2KI 2019-2020 Utrecht University The Netherlands
ARTIFICIAL INTELLIGENCE
Decision making: opponent based
Lecturer: Silja Renooij
These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html Game theory
2 Outline . Game theory: rules for defeating opponents – Perfect information games • Deterministic turn‐taking games – Mini‐max algorithm, Alpha‐beta pruning – Best response, Nash equilibrium – Imperfect information games • Simultaneous‐move games: Mixed strategy – Incomplete information games • Prisoner’s dilemma
3 Game Theory Developed to explain the optimal strategy in two‐person (nowadays n ≥ 2) interactions.
. Initially, von Neumann and Morgenstern – Zero‐sum games (your win == opponents loss) . John Nash – Nonzero‐sum games . Harsanyi, Selten – Incomplete information (Bayesian games)
4 Zero-sum games
Better term: constant‐sum game
Examples of zero‐sum games: . 2‐player; payoffs: 1 for win, ‐1 for loss, 0 for draw total payoff is 0, regardless of the outcome . 2‐player; payoffs: 1 for win, 0 for loss, ½ for draw total payoff is 1, regadless of outcome . 3‐player; payoffs: distribute 3 points over players, depending on performance
Example of non‐zero‐sum game: . 2‐player; payoffs: 3 for win, 0 for loss, 1 for draw
total payoff is either 3 or 2, depending on the outcome. 5 Game types Complete information games: . Perfect information games upon making a move, players know full history of the game, all moves by all players, all payoffs, etc
. Imperfect information games players know all outcomes/payoffs, types of other players & their strategies, but are unaware of (or unsure about) possible actions of other playes ‐ simultaneous moves: what action will others choose? ‐ (temporarily) shielded attributes: who has which cards?
Complete information games can be deterministic or involve chance. 6 Game types
Incomplete information games:
Uncertainty about game being played: factors outside the rules of the game, not known to one or more players, may affect the outcome of the game
• E.g players may not know other players "type", their strategies, payoffs or preferences
Incomplete information games can be deterministic or involve chance.
7 Complete information games
Deterministic Chance
Perfect Chess, checkers, Backgammon, information go, othello, monopoly Tic‐tac‐toe Imperfect Battleships, Bridge, poker, information Minesweeper scrabble
NB(!) textbook says randomness is the difference between perfect and imperfect. Other sources state imperfect == incomplete …. be aware of this! 8 Deterministic Two-player, turn-taking
Perfect information
9 Game tree • alternates between two players. • represents all possibilities from perspective of one player (‘me’) from current root.
me:
you:
me:
you: Zero‐sum or Non‐zero sum?
(my rewards) 10 Minimax . Compute value of perfect play for deterministic, perfect information games – Traverse game tree in DFS‐like manner – ‘bubble up’ values of evaluation function: maximise if my turn, minimise for opponent’s turn
. Serves for selecting next move: choose move to position with highest minimax value = best achievable payoff against best play
. May also serve for finding optimal strategy = from start to finish my best move, for every move of opponent.
11 Minimax: example
NB book uses circles and squares instead of triangles! 12 Minimax algorithm
13 Properties of minimax . Complete? Yes (if tree is finite)
. Optimal? Yes (against an optimal opponent)
. Time complexity? O(bm) for branching factor b and max depth m (in general, worst case, we cannot improve on this, so O(bm) suggested in textbook AI4G must be a typo)
. Space complexity? O(bm) (if depth‐first exploration; can be reduced to O(m) with backtracking variant of DFS which generates one successor at a time, rather than all )
For chess, with b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible
14 Solution: pruning
m
n
. Idea: If m is better for me than n, we will never actually get to n in play (MAX will avoid it) prune that leaf/subtree . let α be the best (= highest) value (to MAX) found so far on current path . define β similarly for MIN: best (= lowest) value found so far
15 Alpha-Beta (α-β) pruning
Minimax, augmented with upper‐ and lowerbounds • Init: for all non‐leaf nodes set α = −∞ lowerbound on achievable score β= ∞ upperbound on achievable score
• Upwards: update α in MAX move; update β in MIN move α = −∞, β = ∞
α = −∞, β = ∞
16 α-β Pruning
function MIN-VALUE is similarly extended: if then return v β ← MIN(β, v)
17 α-β pruning example
α= 3 β= ∞
α= −∞ β= 3
α = best score till now (lowerbound ) updated in own (MAX) move
β= upperbound on achievable score updated in opponents (MIN) move
18 α-β pruning example
α= 3 β= ∞
α= −∞ α= −∞ β= 3 β= 2
19 α-β pruning example
α= 3 β= ∞
α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 14
Prune or continue?
20 α-β pruning example
α= 3 β= ∞
α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 5
Prune or continue?
21 α-β pruning example
α= 3 β= ∞ α= −∞ β= 2 α= −∞ α= −∞ β= 3 β= 2
22 Properties of α-β
. Pruning does not affect final result! . Good move ordering improves effectiveness of pruning . With “perfect ordering”, time complexity = O(bm/2) effective branching factor is √b allows search depth to double for same cost
. A simple example of the value of reasoning about which computations are relevant (a form of meta‐reasoning)
23 Practical feasibility: resource limits
Suppose we have 100 secs and explore 104 nodes/sec 106 nodes can be explored per move
What if we have too little time to reach terminal states (=utility function)? Standard approach combines: . cutoff test:
e.g., depth limit (perhaps add quiescence search: disregard positions that are unlikely to exhibit wild swings in value in near future) . evaluation function = estimated desirability of position
24 Evaluation functions
Evaluation function (cf heuristic with A*): . Returns estimate of expected utility of the game from a given position . Must agree with utility function on terminal nodes . Must not take too long
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) e.g., w1 = 9 with
f1(s) = (# white queens) – (# black queens), etc.
25 Cutting off search
MinimaxCutoff is identical to MinimaxValue except:
1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval
Does it work in practice? bm = 106, b=35 m=4
4‐ply lookahead is a hopeless chess player!
– 4‐ply ≈ human novice – 8‐ply ≈ typical PC, human master – 12‐ply ≈ Deep Blue, Kasparov
26 Deterministic games in practice
. Checkers: Chinook ended 40‐year‐reign of human world champion Marion Tinsley in 1994. Used a pre‐computed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.
. Chess: Deep Blue defeated human world champion Garry Kasparov in a six‐game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.
. Othello: human champions refuse to compete against computers, who are too good.
. Go: human champions refuse to compete against computers, who are too bad. In Go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. 2016/2017: Alpha Go beats world’s number 1 and 2 using deep learning.
27 Strategies and equilibria
28 Big Monkey, Little Monkey example
. Monkeys usually eat ground‐level fruit . Occasionally they climb a tree to shake loose a coconut (1 per tree) . A Coconut yields 10 Calories . Big Monkey expends 2 Calories climbing up the tree, shaking and climbing down. . Little Monkey expends 0 Calories on this exercise.
29 BM and LM utilities
. If only BM climbs the tree LM eats some before BM gets down BM gets 6 C, LM gets 4 C . If only LM climbs the tree BM eats almost all before LM gets down BM gets 9 C, LM gets 1 C . If both climb the tree BM is first to hog coconut BM gets 7 C, LM gets 3 C
How should the monkeys each act so as to maximize their own calorie gain?
30 BM and LM: strategies
Strategies are determined prior to ‘playing the game’
Assume BM will be allowed to move first.
BM has two (single action) strategies: – wait (w), or – climb (c)
LM has four strategies: – If BM waits, then wait; if BM climbs then wait (xw) – If BM waits, then wait; if BM climbs then climb (xx) – If BM waits, then climb; if BM climbs then wait (x¬x) – If BM waits, then climb; if BM climbs then climb (xc) 31 BM and LM: BM moves first
Big monkey w c
Little monkey w c w c
0,0 9,1 6-2,4 7-2,3 (BM,LM) What should Big Monkey do? If BM waits, will outcome be at least that of climbing, regardless of what LM does? No: 0 vs 4, 9 vs 5 …. What if we believe LM will act rationally? 32 BM and LM: BM moves first
Big monkey w c
Little monkey w c w c
0,0 9,1 6-2,4 7-2,3 (BM,LM)
What should Big Monkey do? • If BM waits, LM will climb BM gets 9 • If BM climbs, LM will wait BM gets 4 BM should wait (w) What about Little Monkey? Opposite of BM (x¬x) (eventhough we’ll never get to the right side of the game tree unless BM errs)
33 BM and LM: BM moves first
The game‐tree representation of a game is called extensive form1, as opposed to normal form2 : LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1
What should BM do? What about Little Monkey? wait (w) Opposite of BM (x¬x)
1 game tree that explicitly shows the players´ moves and resulting payoffs 2 table showing payoffs of outcomes of simultaneous ‘decisions’ (strategies) 34 Dominant strategies
Consider a player’s strategies s1 and s2. If, regardless of the other players’ strategy:
• payoff for s1 ≥ payoff for s2 then s1 (weakly) dominates s2 • payoff for s1 > payoff for s2 then s1 strictly dominates s2 A player has a dominant strategy s if s dominates all the player’s other strategies. For Little Monkey x¬x is a weakly dominant strategy; BM does not have a dominant strategy: LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1 35 BM and LM: LM moves first
Little monkey w c
Big monkey w c w c
0,0 4,4 1,9 3,5 (LM,BM) What should Little Monkey do? • If LM waits, BM will climb LM gets 4 • If LM climbs, BM will wait LM gets 1 LM should wait (w) What about Big Monkey? Opposite of LM (x¬x)
36 Responses and equilibria
. Strategies w and x¬x are called best responses: – given what the other player does, this is the best thing to do. . A solution where everyone is playing a best response is called a Nash equilibrium. – No one can unilaterally change and improve things. . Every finite1 game has a Nash equilibrium – but not necessarily in terms of pure strategies!
1 finite in #players and #pure strategies; pure = not mixed (see imperfect information games) 37 BM and LM: equilibria
For each strategy of one player there is a best response of the other multiple Nash equilibria
LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1
BM moves first the following Nash equilibria (BM, LM): .(w, x¬x); (w, xc); (c, xw)
Why isn’t (c, x¬x) a Nash equilibrium?
What if the monkeys have to move simultaneously?
38 Imperfect information
♥♣♠♦
39 BM and LM move together LM w c Big monkey ? c w c c 5,3 4,4 Little monkey wwc BM 0,0 9,1 4,4 5,3 w 9,1 0,0
LM/BM has to choose before he sees BM/LM move…. two obvious Nash equilibria: (c,w), (w,c) A third Nash equilibrium, if both use a mixed strategy: “choose between c & w with p=0.5” each outcome has p=0.25 Expected payoff (BM,LM) = (4.5, 2)
40 Choosing Strategies
. A strategy is optimal if no other strategy/outcome is preferred by all players . In zero‐sum games a pure strategy can be optimal; in non‐zero‐sum games a mixed strategy is required . In the simultaneous game, it’s harder to see what each monkey should do – Mixed strategy is optimal . Often, other techniques can be used to prune the number of possible actions: – E.g. using dominance
41 Incomplete information
? ?
42 Prisoner’s Dilemma
Each player can cooperate or defect
Carl
Rob,Carl cooperate defect
Rob cooperate -1,-1 -10,0
defect 0,-10 -8,-8
43 Prisoner’s Dilemma Each player can cooperate or defect Carl
Rob,Carl cooperate defect
Rob cooperate -1,-1 -10,0
defect 0,-10 -8,-8
Defecting is a (strictly) dominant strategy for Rob
44 Prisoner’s Dilemma Each player can cooperate or defect Carl
Rob,Carl cooperate defect
Rob cooperate -1,-1 -10,0
defect 0,-10 -8,-8
Defecting is also a dominant strategy for Carl Result is not optimal! 45 Prisoner’s Dilemma
. Even though both players would be better off cooperating, mutual defection is the dominant strategy…
. What drives this? – One‐shot game – Inability to trust your opponent (incomplete information: is your opponent selfish or nice?) – Perfect rationality
46 Summary
. (“Board”) games are fun to work on and illustrate several important points about AI
. perfection is unattainable must approximate
. good idea to think about what to think about
47