4/15/15

Game Playing State-of-the-Art CSE 473: Artificial Intelligence § Checkers: Chinook ended 40-year-reign of human world champion Spring 2015 Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved! § : Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per Adversarial Search second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. § Othello: Human champions refuse to compete against computers, which are too good. Dieter Fox § Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning. § Pacman: unknown 1

Adversarial Search Game Playing

§ Many different kinds of games!

§ Choices: § Deterministic or stochastic? § One, two, or more players? § (can you see the state)?

§ Want algorithms for calculating a (policy) which recommends a move in each state

Deterministic Games Zero-Sum Games

§ Many possible formalizations, one is:

§ States: S (start at s0) § Players: P={1...N} (usually take turns) § Actions: A (may depend on player / state) § Transition Function: S x A → S § Terminal Test: S → {t,f} § Zero-Sum Games § General Games § Terminal Utilities: S x P → R § Agents have opposite utilities § Agents have independent utilities (values on outcomes) (values on outcomes) § Lets us think of a single value § Cooperation, indifference, § Solution for a player is a policy: S → A that one maximizes and the competition, & more are possible other minimizes § Adversarial, pure competition

1 4/15/15

Single-Agent Trees Value of a State

Value of a state: Non-Terminal States: The best achievable outcome (ulity) from that state

8 8

2 0 … 2 6 … 4 6 2 0 … 2 6 … 4 6 Terminal States:

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Adversarial Game Trees Values

States Under Agent’s Control: States Under Opponent’s Control:

-8 -5 -10 +8

-20 -8 … -18 -5 … -10 +4 -20 +8 Terminal States:

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Tic-tac-toe Game Tree Adversarial Search (Minimax)

§ Deterministic, zero-sum games: Minimax values: § Tic-tac-toe, chess, checkers computed recursively

§ One player maximizes result 5 max § The other minimizes result

2 5 min § Minimax search: § A state-space search tree § Players alternate turns 8 2 5 6 § Compute each node’s minimax value: the best achievable Terminal values: utility against a rational part of the game (optimal) adversary

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

2 4/15/15

Minimax Implementation Minimax Implementation (Dispatch)

def value(state): def max-value(state): def min-value(state): if the state is a terminal state: return the state’s utility inialize v = -∞ +∞ if the next agent is MAX: return max-value(state) inialize v = if the next agent is MIN: return min-value(state) for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v def max-value(state): def min-value(state): inialize v = -∞ inialize v = +∞ for each successor of state: for each successor of state: v = max(v, min- v = min(v, max- value(successor)) value(successor)) return v return v

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

Concrete Minimax Example Minimax Properties

§ Optimal? § Yes, against perfect player. Otherwise? max max § Time complexity § O(bm) § Space complexity? min min § O(bm)

§ For chess, b ≈ 35, m ≈ 100 10 10 9 100 § Exact solution is completely infeasible § But, do we need to explore the whole tree?

Pruning Example α-β Pruning

§ General configuration § α is the best value that Player MAX can get at any choice point along the current path Opponent α

3 [-∞,2] ? § If n becomes worse than α, MAX will avoid it, so can

stop considering n’s other Player children

§ Define β similarly for MIN Opponent n

Progress of search…

3 4/15/15

Alpha-Beta Pruning

20

© D. Weld, D. Fox 19 © D. Weld, D. Fox

21 22

© D. Weld, D. Fox © D. Weld, D. Fox

23 24

© D. Weld, D. Fox © D. Weld, D. Fox

4 4/15/15

25 26

© D. Weld, D. Fox © D. Weld, D. Fox

27 28

© D. Weld, D. Fox © D. Weld, D. Fox

29 30

© D. Weld, D. Fox © D. Weld, D. Fox

5 4/15/15

31 32

© D. Weld, D. Fox © D. Weld, D. Fox

33 34

© D. Weld, D. Fox © D. Weld, D. Fox

Alpha-Beta Implementation Alpha-Beta Pruning Properties

§ This pruning has no effect on final result at the root α: MAX’s best option on path to root β: MIN’s best option on path to root

§ Values of intermediate nodes might be wrong! § but, they are bounds def max-value(state, α, β): def min-value(state , α, β): inialize v = -∞ inialize v = +∞ § Good child ordering improves effectiveness of pruning for each successor of state: for each successor of state: v = max(v, v = min(v, value(successor, α, β)) value(successor, α, β)) “ ” § With perfect ordering : if v ≥ β return v if v ≤ α return v m/2 § Time complexity drops to O(b ) α = max(α, v) β = min(β, v) § Doubles solvable depth! return v return v § Full search of, e.g. chess, is still hopeless…

Slide from Dan Klein & Pieter Abbeel - ai.berkeley.edu

6