Part 10 Game Theory (Part IV)

Part 10 Game theory (part IV) 306 Brief history 1846 (Babbage): machine to play tic-tac-toe 1928 (von Neumann): minimax theorem 1944 (von Neumann & Morgenstern): backward-induction algorithm (produces perfect play) 1950 (Shannon): minimax algorithm (finite horizon, approximate evaluation) 1951 (Turing): program (on paper) for playing chess 1952–7 (Samuel): checkers program, capable of beating its creator 1956 (McCarthy): pruning to allow deeper search 1957 (Bernstein): first complete chess program, on an IBM 704 vacuum-tube computer, could examine about 350 positions/minute 307 Brief history 1967 (Greenblatt): first program to compete in human chess tournaments: 3 wins, 3 draws, 12 losses 1992 (Schaeffer): Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer): Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al): Deep Blue won 6-game chess match against world chess champion Gary Kasparov 2007 (Schaeffer et al, 2007): Checkers solved: with perfect play, it’s a draw. This took 1014 calculations over 18 years 308 Quick review A strategy tells what an agent will do in every possible situation. Strategies may be pure (deterministic) or mixed (probabilistic). Finite perfect-information zero-sum games: I Finite: finitely many agents, actions and states. I Perfect information: every agent knows the current state, all of the actions, and what they do. No simultaneous actions (agents move one-at-a-time). I n Constant-sum: regardless of how the game ends, i=1(ui) = k (for some constant k). For every such game, there’s an equivalent game in which k = 0. Thus constant-sum P games usually are called zero-sum games. Examples: I Deterministic: chess, checkers, go, othello (reversi), connect-four. I Stochastic: backgammon, monopoly For now, we will consider only deterministic games. 309 Quick review Finite two-person perfect-information zero-sum games: I N = Max, Min . { } I Max’s expected utility is uMax(s, t). I From now on, we will just call this u(s, t). I Since G is zero-sum, u (s, t) = u(s, t). Min − I Max wants to maximize u and Min wants to minimize it. 310 Quick review Theorem (Minimax theorem, von Neumann 1928) Let G be a two-person finite zero-sum game. Then there are strategies s∗ and t∗, and a number u∗, called G’s minimax value, such that: I If Min uses t , Max’s expected utility is u , i.e., max u(s, t ) = u . ∗ ≤ ∗ s ∗ ∗ I If Max uses s , Max’s expected utility is u , i.e., min u(s , t) = u . ∗ ≥ ∗ t ∗ ∗ Corollary u(s∗, t∗) = u∗ Corollary If G is a perfect-information game, then there are pure strategies s∗ and t∗ that satisfy the theorem. 311 Strategies on game trees To construct a pure strategy for Max: 1. At each node where it’s Max’s move, choose one branch. 2. At each node where it’s Min’s move, include all branches. Let b be the tree’s branching factor (max. number of children of any node). Let m be the tee’s height (max. depth of any node). m The number of pure strategies for Max is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children). 312 Strategies on game trees To construct a pure strategy for Min: 1. At each node where it’s Min’s move, choose one branch. 2. At each node where it’s Max’s move, include all branches. m The number of pure strategies for Min is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children). 313 Finding the best strategy The brute-force algorithm to find Max’s and Min’s best strategies: I Construct the sets S and T of all Max’s and Min’s pure strategies, then choose: s∗ = arg max min u(s, t) s S t T ∈ ∈ t∗ = arg min max u(s, t) t T s S ∈ ∈ Complexity analysis: m m m I Constructs and stores O(b 2 + b 2 ) = O(b 2 ) strategies. m I Each strategy is a tree that has O(b 2 ) nodes. m m m I Thus, space complexity is O(b 2 b 2 ) = O(b ). I Time complexity is slightly worse. But there is an easier way to find best strategies. 314 Backward induction For two-person perfect-information zero-sum games Algorithm function backward-induction(h) returns u(h) if h Z then return u(h) ∈ if ρ(h) = Max then return maxa A (backward-induction(σ(h, a)) ∈ else return mina A (backward-induction(σ(h, a)) ∈ To get action to be performed at h, return the action (arg max or arg min). Complexity analysis: I Space complexity: O(bm). I Time complexity: O(bm). For example, Chess: I b 35 and m 100 (“reasonable” games); ≈ ≈ I 35100 10150 nodes; ≈ I number of particles in the universe: 1087; ≈ I Impossible to explore them all! 315 Minimax algorithm Modified version of the backward induction: I ` is an upper bound on the search depth. I e(h) is a static evaluation function that returns an estimate of u∗(h). I Whenever we reach a non-terminal node h of depth `, return e(h). If ` = , then function e is never called, and Minimax returns u (h). ∞ ∗ Algorithm (Minimax – Shannon, 1950) function minimax(h, `) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then return maxa A (minimax(σ(h, a), ` 1)) ∈ − else return mina A (minimax(σ(h, a), ` 1)) ∈ − 316 Evaluation functions e(h) is often a weighted sum of features. E.g., in chess: 1(#white pawns #black pawns) + 3(#white knights #black nights) + ... − − The exact values of e(h) does not matter. The behaviour of the algorithm is preserved under any monotonic transformation of e. Only the order matters: payoff acts as an ordinal utility. 317 Prunning Backward-induction and minimax both look at nodes that do not need to be examined. Max a 3 ≥ Min b 3 f 2 h 14 5 2 ≤ ≤ ≤ ≤ c d e g ? ? i j k 3 12 8 2 14 5 2 Max never goes to node f, because Max gets a higher utility by going to node b. Since f is worse than b, it cannot affect minimax value of node a. Does not know whether h is better or worse than a. Still do not know whether h is better or worse than a. h is worse than a. 318 Alpha-beta algorithm Algorithm function alpha-beta(h, `, α, β) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then v ← −∞ forall a χ(h) do ∈ v max(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v β then return v ≥ α max(α, v) ← else v ← ∞ forall a χ(h) do ∈ v min(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v α then return v ≤ β min(β, v) ← return v 319 Example Max a (7, ,) ) −∞∞∞ 87 Min b c (7, 8)) ∞ 7 8 Max d (7, ) m ∞ 85 9 Min e (7, ) j (7, 8)) ∞ ∞ 5 α-cutoff 8 Max f (7, ) i k (7, ) l (7, 8) ∞ ∞ 5 8 9 β-cutoff g h 5 3 0 8 9 − 320 Properties of Alpha-beta prunning Alpha-beta is a simple example of reasoning about which computations are relevant (a form of metareasoning). if α minimax(h, `) β, then alpha-beta(h, `, α, β) returns minimax(h, `). ≤ ≤ if minimax(h, `) α, then alpha-beta(h, `, α, β) returns a value α. ≤ ≤ if minimax(h, `) β, then alpha-beta(h, `, α, β) returns a value β. ≥ ≥ Consequently: I If α = and β = , then alpha-beta(h, `, α, β) returns minimax(h, `) −∞ ∞ I If α = , β = , and ` = , then alpha-beta(h, `, α, β) returns u (u). −∞ ∞ ∞ ∗ 321 Properties of Alpha-beta prunning Good move ordering can enable prunning more nodes. The best case is if: I at nodes where it’s Max’s move, children are largest-value first I at nodes where it’s Min’s move, children are smallest-value first h In this case, Alpha-beta’s time complexity is O(b 2 ), which doubles the solvable depth. Worst case is the reverse: I at nodes where it’s Max’s move, children are smallest-value first I at nodes where it’s Min’s move, children are largest-value first In this case, Alpha-beta will visit every node of depth `. ≤ Hence time complexity is the same as minimax: O(bm) 322 Discussion Deeper lookahead (i.e., larger depth bound `) usually gives better decisions. Exceptions do exist: I “Pathological” games in which deeper lookahead gives worse decisions. I But such games are rare. Suppose we have 100 seconds, and we can explore 104 nodes/second: 6 8 I Thus, we can explore 10 35 2 nodes per move. ≈ I Alpha-beta reaches depth 8. I Pretty good chess program. Some modifications that can improve further the accuracy or computation time: I node ordering (see next slide) I quiescence search I biasing I transposition tables I thinking on the opponent’s time 323 Node ordering Recall that we have the best case if: I at nodes where it’s Max’s move, children are largest first I at nodes where it’s Min’s move, children are smallest first h In the best case time complexity is O(b 2 ), which doubles the solvable depth.

Part 10 Game Theory (Part IV)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support