ARTIFICIAL INTELLIGENCE Decision Making: Opponent Based

INFOB2KI 2019-2020 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html Game theory 2 Outline . Game theory: rules for defeating opponents – Perfect information games • Deterministic turn‐taking games – Mini‐max algorithm, Alpha‐beta pruning – Best response, Nash equilibrium – Imperfect information games • Simultaneous‐move games: Mixed strategy – Incomplete information games • Prisoner’s dilemma 3 Game Theory Developed to explain the optimal strategy in two‐person (nowadays n ≥ 2) interactions. Initially, von Neumann and Morgenstern – Zero‐sum games (your win == opponents loss) . John Nash – Nonzero‐sum games . Harsanyi, Selten – Incomplete information (Bayesian games) 4 Zero-sum games Better term: constant‐sum game Examples of zero‐sum games: . 2‐player; payoffs: 1 for win, ‐1 for loss, 0 for draw total payoff is 0, regardless of the outcome . 2‐player; payoffs: 1 for win, 0 for loss, ½ for draw total payoff is 1, regadless of outcome . 3‐player; payoffs: distribute 3 points over players, depending on performance Example of non‐zero‐sum game: . 2‐player; payoffs: 3 for win, 0 for loss, 1 for draw total payoff is either 3 or 2, depending on the outcome. 5 Game types Complete information games: . Perfect information games upon making a move, players know full history of the game, all moves by all players, all payoffs, etc . Imperfect information games players know all outcomes/payoffs, types of other players & their strategies, but are unaware of (or unsure about) possible actions of other playes ‐ simultaneous moves: what action will others choose? ‐ (temporarily) shielded attributes: who has which cards? Complete information games can be deterministic or involve chance. 6 Game types Incomplete information games: Uncertainty about game being played: factors outside the rules of the game, not known to one or more players, may affect the outcome of the game • E.g players may not know other players "type", their strategies, payoffs or preferences Incomplete information games can be deterministic or involve chance. 7 Complete information games Deterministic Chance Perfect Chess, checkers, Backgammon, information go, othello, monopoly Tic‐tac‐toe Imperfect Battleships, Bridge, poker, information Minesweeper scrabble NB(!) textbook says randomness is the difference between perfect and imperfect. Other sources state imperfect == incomplete …. be aware of this! 8 Deterministic Two-player, turn-taking Perfect information 9 Game tree • alternates between two players. • represents all possibilities from perspective of one player (‘me’) from current root. me: you: me: you: Zero‐sum or Non‐zero sum? (my rewards) 10 Minimax . Compute value of perfect play for deterministic, perfect information games – Traverse game tree in DFS‐like manner – ‘bubble up’ values of evaluation function: maximise if my turn, minimise for opponent’s turn . Serves for selecting next move: choose move to position with highest minimax value = best achievable payoff against best play . May also serve for finding optimal strategy = from start to finish my best move, for every move of opponent. 11 Minimax: example NB book uses circles and squares instead of triangles! 12 Minimax algorithm 13 Properties of minimax . Complete? Yes (if tree is finite) . Optimal? Yes (against an optimal opponent) . Time complexity? O(bm) for branching factor b and max depth m (in general, worst case, we cannot improve on this, so O(bm) suggested in textbook AI4G must be a typo) . Space complexity? O(bm) (if depth‐first exploration; can be reduced to O(m) with backtracking variant of DFS which generates one successor at a time, rather than all ) For chess, with b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible 14 Solution: pruning m n . Idea: If m is better for me than n, we will never actually get to n in play (MAX will avoid it) prune that leaf/subtree . let α be the best (= highest) value (to MAX) found so far on current path . define β similarly for MIN: best (= lowest) value found so far 15 Alpha-Beta (α-β) pruning Minimax, augmented with upper‐ and lowerbounds • Init: for all non‐leaf nodes set α = −∞ lowerbound on achievable score β= ∞ upperbound on achievable score • Upwards: update α in MAX move; update β in MIN move α = −∞, β = ∞ α = −∞, β = ∞ 16 α-β Pruning function MIN-VALUE is similarly extended: if then return v β ← MIN(β, v) 17 α-β pruning example α= 3 β= ∞ α= −∞ β= 3 α = best score till now (lowerbound ) updated in own (MAX) move β= upperbound on achievable score updated in opponents (MIN) move 18 α-β pruning example α= 3 β= ∞ α= −∞ α= −∞ β= 3 β= 2 19 α-β pruning example α= 3 β= ∞ α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 14 Prune or continue? 20 α-β pruning example α= 3 β= ∞ α= −∞ α= −∞ α= −∞ β= 3 β= 2 β= 5 Prune or continue? 21 α-β pruning example α= 3 β= ∞ α= −∞ β= 2 α= −∞ α= −∞ β= 3 β= 2 22 Properties of α-β . Pruning does not affect final result! . Good move ordering improves effectiveness of pruning . With “perfect ordering”, time complexity = O(bm/2) effective branching factor is √b allows search depth to double for same cost . A simple example of the value of reasoning about which computations are relevant (a form of meta‐reasoning) 23 Practical feasibility: resource limits Suppose we have 100 secs and explore 104 nodes/sec 106 nodes can be explored per move What if we have too little time to reach terminal states (=utility function)? Standard approach combines: . cutoff test: e.g., depth limit (perhaps add quiescence search: disregard positions that are unlikely to exhibit wild swings in value in near future) . evaluation function = estimated desirability of position 24 Evaluation functions Evaluation function (cf heuristic with A*): . Returns estimate of expected utility of the game from a given position . Must agree with utility function on terminal nodes . Must not take too long For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) e.g., w1 = 9 with f1(s) = (# white queens) – (# black queens), etc. 25 Cutting off search MinimaxCutoff is identical to MinimaxValue except: 1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval Does it work in practice? bm = 106, b=35 m=4 4‐ply lookahead is a hopeless chess player! – 4‐ply ≈ human novice – 8‐ply ≈ typical PC, human master – 12‐ply ≈ Deep Blue, Kasparov 26 Deterministic games in practice . Checkers: Chinook ended 40‐year‐reign of human world champion Marion Tinsley in 1994. Used a pre‐computed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. Chess: Deep Blue defeated human world champion Garry Kasparov in a six‐game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: human champions refuse to compete against computers, who are too good. Go: human champions refuse to compete against computers, who are too bad. In Go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. 2016/2017: Alpha Go beats world’s number 1 and 2 using deep learning. 27 Strategies and equilibria 28 Big Monkey, Little Monkey example . Monkeys usually eat ground‐level fruit . Occasionally they climb a tree to shake loose a coconut (1 per tree) . A Coconut yields 10 Calories . Big Monkey expends 2 Calories climbing up the tree, shaking and climbing down. Little Monkey expends 0 Calories on this exercise. 29 BM and LM utilities . If only BM climbs the tree LM eats some before BM gets down BM gets 6 C, LM gets 4 C . If only LM climbs the tree BM eats almost all before LM gets down BM gets 9 C, LM gets 1 C . If both climb the tree BM is first to hog coconut BM gets 7 C, LM gets 3 C How should the monkeys each act so as to maximize their own calorie gain? 30 BM and LM: strategies Strategies are determined prior to ‘playing the game’ Assume BM will be allowed to move first. BM has two (single action) strategies: – wait (w), or – climb (c) LM has four strategies: – If BM waits, then wait; if BM climbs then wait (xw) – If BM waits, then wait; if BM climbs then climb (xx) – If BM waits, then climb; if BM climbs then wait (x¬x) – If BM waits, then climb; if BM climbs then climb (xc) 31 BM and LM: BM moves first Big monkey w c Little monkey w c w c 0,0 9,1 6-2,4 7-2,3 (BM,LM) What should Big Monkey do? If BM waits, will outcome be at least that of climbing, regardless of what LM does? No: 0 vs 4, 9 vs 5 …. What if we believe LM will act rationally? 32 BM and LM: BM moves first Big monkey w c Little monkey w c w c 0,0 9,1 6-2,4 7-2,3 (BM,LM) What should Big Monkey do? • If BM waits, LM will climb BM gets 9 • If BM climbs, LM will wait BM gets 4 BM should wait (w) What about Little Monkey? Opposite of BM (x¬x) (eventhough we’ll never get to the right side of the game tree unless BM errs) 33 BM and LM: BM moves first The game‐tree representation of a game is called extensive form1, as opposed to normal form2 : LM xc xw xx x¬x c 5,3 4,4 5,3 4,4 BM w 9,1 0,0 0,0 9,1 What should BM do? What about Little Monkey? wait (w) Opposite of BM (x¬x) 1 game tree that explicitly shows the players´ moves and resulting payoffs 2 table showing payoffs of outcomes of simultaneous ‘decisions’ (strategies) 34 Dominant strategies Consider a player’s strategies s1 and s2.

ARTIFICIAL INTELLIGENCE Decision Making: Opponent Based

Lecture 4 Rationalizability & Nash Equilibrium Road

Minimax TD-Learning with Neural Nets in a Markov Game

SEQUENTIAL GAMES with PERFECT INFORMATION Example

Economics 201B Economic Theory (Spring 2021) Strategic Games

Pure and Bayes-Nash Price of Anarchy for Generalized Second Price Auction

Extensive Form Games with Perfect Information

Bayesian Games Professors Greenwald 2018-01-31

Games with Hidden Information

Outline for Static Games of Complete Information I

Efficiency and Welfare in Economies with Incomplete Information∗

Introduction to Game Theory: a Discovery Approach

Extensive-Form Games with Perfect Information