Part 10 Game Theory (Part IV)
Total Page:16
File Type:pdf, Size:1020Kb
Part 10 Game theory (part IV) 306 Brief history 1846 (Babbage): machine to play tic-tac-toe 1928 (von Neumann): minimax theorem 1944 (von Neumann & Morgenstern): backward-induction algorithm (produces perfect play) 1950 (Shannon): minimax algorithm (finite horizon, approximate evaluation) 1951 (Turing): program (on paper) for playing chess 1952–7 (Samuel): checkers program, capable of beating its creator 1956 (McCarthy): pruning to allow deeper search 1957 (Bernstein): first complete chess program, on an IBM 704 vacuum-tube computer, could examine about 350 positions/minute 307 Brief history 1967 (Greenblatt): first program to compete in human chess tournaments: 3 wins, 3 draws, 12 losses 1992 (Schaeffer): Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer): Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al): Deep Blue won 6-game chess match against world chess champion Gary Kasparov 2007 (Schaeffer et al, 2007): Checkers solved: with perfect play, it’s a draw. This took 1014 calculations over 18 years 308 Quick review A strategy tells what an agent will do in every possible situation. Strategies may be pure (deterministic) or mixed (probabilistic). Finite perfect-information zero-sum games: I Finite: finitely many agents, actions and states. I Perfect information: every agent knows the current state, all of the actions, and what they do. No simultaneous actions (agents move one-at-a-time). I n Constant-sum: regardless of how the game ends, i=1(ui) = k (for some constant k). For every such game, there’s an equivalent game in which k = 0. Thus constant-sum P games usually are called zero-sum games. Examples: I Deterministic: chess, checkers, go, othello (reversi), connect-four. I Stochastic: backgammon, monopoly For now, we will consider only deterministic games. 309 Quick review Finite two-person perfect-information zero-sum games: I N = Max, Min . { } I Max’s expected utility is uMax(s, t). I From now on, we will just call this u(s, t). I Since G is zero-sum, u (s, t) = u(s, t). Min − I Max wants to maximize u and Min wants to minimize it. 310 Quick review Theorem (Minimax theorem, von Neumann 1928) Let G be a two-person finite zero-sum game. Then there are strategies s∗ and t∗, and a number u∗, called G’s minimax value, such that: I If Min uses t , Max’s expected utility is u , i.e., max u(s, t ) = u . ∗ ≤ ∗ s ∗ ∗ I If Max uses s , Max’s expected utility is u , i.e., min u(s , t) = u . ∗ ≥ ∗ t ∗ ∗ Corollary u(s∗, t∗) = u∗ Corollary If G is a perfect-information game, then there are pure strategies s∗ and t∗ that satisfy the theorem. 311 Strategies on game trees To construct a pure strategy for Max: 1. At each node where it’s Max’s move, choose one branch. 2. At each node where it’s Min’s move, include all branches. Let b be the tree’s branching factor (max. number of children of any node). Let m be the tee’s height (max. depth of any node). m The number of pure strategies for Max is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children). 312 Strategies on game trees To construct a pure strategy for Min: 1. At each node where it’s Min’s move, choose one branch. 2. At each node where it’s Max’s move, include all branches. m The number of pure strategies for Min is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children). 313 Finding the best strategy The brute-force algorithm to find Max’s and Min’s best strategies: I Construct the sets S and T of all Max’s and Min’s pure strategies, then choose: s∗ = arg max min u(s, t) s S t T ∈ ∈ t∗ = arg min max u(s, t) t T s S ∈ ∈ Complexity analysis: m m m I Constructs and stores O(b 2 + b 2 ) = O(b 2 ) strategies. m I Each strategy is a tree that has O(b 2 ) nodes. m m m I Thus, space complexity is O(b 2 b 2 ) = O(b ). I Time complexity is slightly worse. But there is an easier way to find best strategies. 314 Backward induction For two-person perfect-information zero-sum games Algorithm function backward-induction(h) returns u(h) if h Z then return u(h) ∈ if ρ(h) = Max then return maxa A (backward-induction(σ(h, a)) ∈ else return mina A (backward-induction(σ(h, a)) ∈ To get action to be performed at h, return the action (arg max or arg min). Complexity analysis: I Space complexity: O(bm). I Time complexity: O(bm). For example, Chess: I b 35 and m 100 (“reasonable” games); ≈ ≈ I 35100 10150 nodes; ≈ I number of particles in the universe: 1087; ≈ I Impossible to explore them all! 315 Minimax algorithm Modified version of the backward induction: I ` is an upper bound on the search depth. I e(h) is a static evaluation function that returns an estimate of u∗(h). I Whenever we reach a non-terminal node h of depth `, return e(h). If ` = , then function e is never called, and Minimax returns u (h). ∞ ∗ Algorithm (Minimax – Shannon, 1950) function minimax(h, `) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then return maxa A (minimax(σ(h, a), ` 1)) ∈ − else return mina A (minimax(σ(h, a), ` 1)) ∈ − 316 Evaluation functions e(h) is often a weighted sum of features. E.g., in chess: 1(#white pawns #black pawns) + 3(#white knights #black nights) + ... − − The exact values of e(h) does not matter. The behaviour of the algorithm is preserved under any monotonic transformation of e. Only the order matters: payoff acts as an ordinal utility. 317 Prunning Backward-induction and minimax both look at nodes that do not need to be examined. Max a 3 ≥ Min b 3 f 2 h 14 5 2 ≤ ≤ ≤ ≤ c d e g ? ? i j k 3 12 8 2 14 5 2 Max never goes to node f, because Max gets a higher utility by going to node b. Since f is worse than b, it cannot affect minimax value of node a. Does not know whether h is better or worse than a. Still do not know whether h is better or worse than a. h is worse than a. 318 Alpha-beta algorithm Algorithm function alpha-beta(h, `, α, β) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then v ← −∞ forall a χ(h) do ∈ v max(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v β then return v ≥ α max(α, v) ← else v ← ∞ forall a χ(h) do ∈ v min(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v α then return v ≤ β min(β, v) ← return v 319 Example Max a (7, ,) ) −∞∞∞ 87 Min b c (7, 8)) ∞ 7 8 Max d (7, ) m ∞ 85 9 Min e (7, ) j (7, 8)) ∞ ∞ 5 α-cutoff 8 Max f (7, ) i k (7, ) l (7, 8) ∞ ∞ 5 8 9 β-cutoff g h 5 3 0 8 9 − 320 Properties of Alpha-beta prunning Alpha-beta is a simple example of reasoning about which computations are relevant (a form of metareasoning). if α minimax(h, `) β, then alpha-beta(h, `, α, β) returns minimax(h, `). ≤ ≤ if minimax(h, `) α, then alpha-beta(h, `, α, β) returns a value α. ≤ ≤ if minimax(h, `) β, then alpha-beta(h, `, α, β) returns a value β. ≥ ≥ Consequently: I If α = and β = , then alpha-beta(h, `, α, β) returns minimax(h, `) −∞ ∞ I If α = , β = , and ` = , then alpha-beta(h, `, α, β) returns u (u). −∞ ∞ ∞ ∗ 321 Properties of Alpha-beta prunning Good move ordering can enable prunning more nodes. The best case is if: I at nodes where it’s Max’s move, children are largest-value first I at nodes where it’s Min’s move, children are smallest-value first h In this case, Alpha-beta’s time complexity is O(b 2 ), which doubles the solvable depth. Worst case is the reverse: I at nodes where it’s Max’s move, children are smallest-value first I at nodes where it’s Min’s move, children are largest-value first In this case, Alpha-beta will visit every node of depth `. ≤ Hence time complexity is the same as minimax: O(bm) 322 Discussion Deeper lookahead (i.e., larger depth bound `) usually gives better decisions. Exceptions do exist: I “Pathological” games in which deeper lookahead gives worse decisions. I But such games are rare. Suppose we have 100 seconds, and we can explore 104 nodes/second: 6 8 I Thus, we can explore 10 35 2 nodes per move. ≈ I Alpha-beta reaches depth 8. I Pretty good chess program. Some modifications that can improve further the accuracy or computation time: I node ordering (see next slide) I quiescence search I biasing I transposition tables I thinking on the opponent’s time 323 Node ordering Recall that we have the best case if: I at nodes where it’s Max’s move, children are largest first I at nodes where it’s Min’s move, children are smallest first h In the best case time complexity is O(b 2 ), which doubles the solvable depth.