CSE 473: Artificial Intelligence Game Playing State-Of-The-Art Adversarial

10/5/2015 CSE 473: Artificial Intelligence Game Playing State-of-the-Art . Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of Autumn 2015 443,748,401,247 positions. Checkers is now solved! . Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per Adversarial Search second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. Steve Tanimoto . Othello: Human champions refuse to compete against computers, which are too good. Go: Human champions are beginning to be challenged by machines, With slides from : though the best humans still beat the best machines. In go, b > 300, so Dieter Fox, Dan Weld, Dan Klein, Stuart Russell, Andrew Moore, Luke Zettlemoyer most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning. Pacman: unknown 1 Adversarial Search Game Playing . Many different kinds of games! . Choices: . Deterministic or stochastic? . One, two, or more players? . Perfect information (can you see the state)? . Want algorithms for calculating a strategy (policy) which recommends a move in each state Deterministic Games Zero-Sum Games . Many possible formalizations, one is: . States: S (start at s0) . Players: P={1,...,N} (usually take turns) . Actions: A (may depend on player / state) . Transition Function: S x A S . Terminal Test: S {t,f} . Zero-Sum Games . General Games . Terminal Utilities: S x P R . Agents have opposite utilities . Agents have independent utilities (values on outcomes) (values on outcomes) . Lets us think of a single value . Cooperation, indifference, . Solution for a player is a policy: S A that one maximizes and the competition, & more are possible other minimizes . Adversarial, pure competition 1 10/5/2015 Single-Agent Trees Value of a State Value of a state: Non-Terminal States: The best achievable outcome (utility) from that state 8 8 20… 26 … 46 20… 26 … 46 Terminal States: Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Adversarial Game Trees Minimax Values States Under Agent’s Control: States Under Opponent’s Control: -8 -5 -10 +8 Terminal States: -20 -8… -18 -5 … -10 + - + 4 20 8 Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Tic-tac-toe Game Tree Adversarial Search (Minimax) . Deterministic, zero-sum games: Minimax values: . Tic-tac-toe, chess, checkers computed recursively . One player maximizes result 5 max . The other minimizes result 2 5 min . Minimax search: . A state-space search tree . Players alternate turns 8 2 5 6 . Compute each node’s minimax value: the best achievable Terminal values: utility against a rational part of the game (optimal) adversary Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu 2 10/5/2015 Minimax Implementation Minimax Implementation (Dispatch) def value(state): def max-value(state): def min-value(state): if the state is a terminal state: return the state’s utility initialize v = -∞ +∞ if the next agent is MAX: return max-value(state) initialize v = if the next agent is MIN: return min-value(state) for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Concrete Minimax Example Minimax Properties . Optimal? . Yes, against perfect player. Otherwise? max max . Time complexity . O(bm) . Space complexity? min min . O(bm) . For chess, b 35, m 100 10 10 9 100 . Exact solution is completely infeasible . But, do we need to explore the whole tree? Pruning Example - Pruning . General configuration . is the best value that Player MAX can get at any choice point along the current path Opponent 3[-,2] ? . If n becomes worse than , MAX will avoid it, so can stop considering n’s other Player children . Define similarly for MIN Opponent n Progress of search… 3 10/5/2015 Alpha-Beta Pruning 20 © D. Weld, D. Fox 19 © D. Weld, D. Fox 21 22 © D. Weld, D. Fox © D. Weld, D. Fox 23 24 © D. Weld, D. Fox © D. Weld, D. Fox 4 10/5/2015 25 26 © D. Weld, D. Fox © D. Weld, D. Fox 27 28 © D. Weld, D. Fox © D. Weld, D. Fox 29 30 © D. Weld, D. Fox © D. Weld, D. Fox 5 10/5/2015 31 32 © D. Weld, D. Fox © D. Weld, D. Fox 33 34 © D. Weld, D. Fox © D. Weld, D. Fox Alpha-Beta Implementation Alpha-Beta Pruning Properties . This pruning has no effect on final result at the root α: MAX’s best option on path to root β: MIN’s best option on path to root . Values of intermediate nodes might be wrong! . but, they are bounds def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ . Good child ordering improves effectiveness of pruning for each successor of state: for each successor of state: v = max(v, v = min(v, value(successor, α, β)) value(successor, α, β)) . With “perfect ordering”: if v ≥ β return v if v ≤ α return v . Time complexity drops to O(bm/2) α = max(α, v) β = min(β, v) . Doubles solvable depth! return v return v . Full search of, e.g. chess, is still hopeless… Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu 6 10/5/2015 Resource Limits Evaluation Functions . Function which scores non-terminals . Cannot search to leaves max 4 . Depth-limited search -2 4 min min . Instead, search a limited depth of tree . Replace terminal utilities with an eval -1 -2 4 9 function for non-terminal positions . Guarantee of optimal play is gone . Example: . Suppose we have 100 seconds, can explore 10K nodes / sec . So can check 1M nodes per move . Ideal function: returns the utility of the position . - reaches about depth 8 (a decent . In practice: typically weighted linear sum of features: chess program) . e.g. f (s) = (num white queens – num black queens), etc. ? ? ? ? 1 Which algorithm? Which algorithm? -,depth 6, simple eval. fun. -,depth 4,better eval. fun. 7.

Load more