10/5/2015

CSE 473: Artificial Intelligence Game Playing State-of-the-Art . Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of Autumn 2015 443,748,401,247 positions. Checkers is now solved! . : Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per Adversarial Search second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. Steve Tanimoto . Othello: Human champions refuse to compete against computers, which are too good. . Go: Human champions are beginning to be challenged by machines, With slides from : though the best humans still beat the best machines. In go, b > 300, so Dieter Fox, Dan Weld, Dan Klein, Stuart Russell, Andrew Moore, Luke Zettlemoyer most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning. . Pacman: unknown 1

Adversarial Search Game Playing

. Many different kinds of games!

. Choices: . Deterministic or stochastic? . One, two, or more players? . (can you see the state)?

. Want algorithms for calculating a (policy) which recommends a move in each state

Deterministic Games Zero-Sum Games

. Many possible formalizations, one is:

. States: S (start at s0) . Players: P={1,...,N} (usually take turns) . Actions: A (may depend on player / state) . Transition Function: S x A  S . Terminal Test: S  {t,f} . Zero-Sum Games . General Games . Terminal Utilities: S x P  R . Agents have opposite utilities . Agents have independent utilities (values on outcomes) (values on outcomes) . Lets us think of a single value . Cooperation, indifference, . Solution for a player is a policy: S  A that one maximizes and the competition, & more are possible other minimizes . Adversarial, pure competition

1 10/5/2015

Single-Agent Trees Value of a State

Value of a state: Non-Terminal States: The best achievable outcome (utility) from that state

8 8

20… 26 … 46 20… 26 … 46 Terminal States:

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Adversarial Game Trees Values

States Under Agent’s Control: States Under Opponent’s Control:

-8 -5 -10 +8

Terminal States: -20 -8… -18 -5 … -10 + - + 4 20 8 Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Tic-tac-toe Game Tree Adversarial Search (Minimax)

. Deterministic, zero-sum games: Minimax values: . Tic-tac-toe, chess, checkers computed recursively

. One player maximizes result 5 max . The other minimizes result

2 5 min . Minimax search: . A state-space search tree . Players alternate turns 8 2 5 6 . Compute each node’s minimax value: the best achievable Terminal values: utility against a rational part of the game (optimal) adversary

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

2 10/5/2015

Minimax Implementation Minimax Implementation (Dispatch)

def value(state): def max-value(state): def min-value(state): if the state is a terminal state: return the state’s utility initialize v = -∞ +∞ if the next agent is MAX: return max-value(state) initialize v = if the next agent is MIN: return min-value(state) for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v def max-value(state): def min-value(state): initialize v = -∞ initialize v = +∞ for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Concrete Minimax Example Minimax Properties

. Optimal? . Yes, against perfect player. Otherwise? max max . Time complexity . O(bm) . Space complexity? min min . O(bm)

. For chess, b  35, m  100 10 10 9 100 . Exact solution is completely infeasible . But, do we need to explore the whole tree?

Pruning Example - Pruning

. General configuration .  is the best value that Player MAX can get at any choice point along the current path Opponent 

3[-,2] ? . If n becomes worse than , MAX will avoid it, so can

stop considering n’s other Player children

. Define  similarly for MIN Opponent n

Progress of search…

3 10/5/2015

Alpha-Beta Pruning

20

© D. Weld, D. Fox 19 © D. Weld, D. Fox

21 22

© D. Weld, D. Fox © D. Weld, D. Fox

23 24

© D. Weld, D. Fox © D. Weld, D. Fox

4 10/5/2015

25 26

© D. Weld, D. Fox © D. Weld, D. Fox

27 28

© D. Weld, D. Fox © D. Weld, D. Fox

29 30

© D. Weld, D. Fox © D. Weld, D. Fox

5 10/5/2015

31 32

© D. Weld, D. Fox © D. Weld, D. Fox

33 34

© D. Weld, D. Fox © D. Weld, D. Fox

Alpha-Beta Implementation Alpha-Beta Pruning Properties

. This pruning has no effect on final result at the root α: MAX’s best option on path to root β: MIN’s best option on path to root . Values of intermediate nodes might be wrong! . but, they are bounds def max-value(state, α, β): def min-value(state , α, β): initialize v = -∞ initialize v = +∞ . Good child ordering improves effectiveness of pruning for each successor of state: for each successor of state: v = max(v, v = min(v, value(successor, α, β)) value(successor, α, β)) . With “perfect ordering”: if v ≥ β return v if v ≤ α return v . Time complexity drops to O(bm/2) α = max(α, v) β = min(β, v) . Doubles solvable depth! return v return v . Full search of, e.g. chess, is still hopeless…

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

6 10/5/2015

Resource Limits Evaluation Functions . Function which scores non-terminals . Cannot search to leaves max 4 . Depth-limited search -2 4 min min . Instead, search a limited depth of tree . Replace terminal utilities with an eval -1 -2 4 9 function for non-terminal positions . Guarantee of optimal play is gone . Example: . Suppose we have 100 seconds, can explore 10K nodes / sec . So can check 1M nodes per move . Ideal function: returns the utility of the position . - reaches about depth 8 (a decent . In practice: typically weighted linear sum of features: chess program) . e.g. f (s) = (num white queens – num black queens), etc. ? ? ? ? 1

Which algorithm? Which algorithm?

-,depth 6, simple eval. fun. -,depth 4,better eval. fun.

7