Part 10 (part IV)

306 Brief history

1846 (Babbage): machine to play tic-tac-toe 1928 (von Neumann): theorem 1944 (von Neumann & Morgenstern): backward-induction algorithm (produces perfect play) 1950 (Shannon): minimax algorithm (finite horizon, approximate evaluation) 1951 (Turing): program (on paper) for playing 1952–7 (Samuel): checkers program, capable of beating its creator 1956 (McCarthy): pruning to allow deeper search 1957 (Bernstein): first complete chess program, on an IBM 704 vacuum-tube computer, could examine about 350 positions/minute

307 Brief history

1967 (Greenblatt): first program to compete in human chess tournaments: 3 wins, 3 draws, 12 losses 1992 (Schaeffer): Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer): Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al): Deep Blue won 6-game chess match against world chess champion Gary Kasparov 2007 (Schaeffer et al, 2007): Checkers solved: with perfect play, it’s a draw. This took 1014 calculations over 18 years

308 Quick review

A tells what an agent will do in every possible situation. Strategies may be pure (deterministic) or mixed (probabilistic). Finite perfect-information zero-sum games: I Finite: finitely many agents, actions and states. I : every agent knows the current state, all of the actions, and what they do. No simultaneous actions (agents move one-at-a-time). I n Constant-sum: regardless of how the game ends, i=1(ui) = k (for some constant k). For every such game, there’s an equivalent game in which k = 0. Thus constant-sum P games usually are called zero-sum games. Examples: I Deterministic: chess, checkers, go, othello (reversi), connect-four. I Stochastic: backgammon,

For now, we will consider only deterministic games.

309 Quick review

Finite two-person perfect-information zero-sum games: I N = Max, Min . { } I Max’s expected utility is uMax(s, t). I From now on, we will just call this u(s, t). I Since G is zero-sum, u (s, t) = u(s, t). Min − I Max wants to maximize u and Min wants to minimize it.

310 Quick review

Theorem (Minimax theorem, von Neumann 1928) Let G be a two-person finite zero-sum game. Then there are strategies s∗ and t∗, and a number u∗, called G’s minimax value, such that: I If Min uses t , Max’s expected utility is u , i.e., max u(s, t ) = u . ∗ ≤ ∗ s ∗ ∗ I If Max uses s , Max’s expected utility is u , i.e., min u(s , t) = u . ∗ ≥ ∗ t ∗ ∗ Corollary

u(s∗, t∗) = u∗ Corollary

If G is a perfect-information game, then there are pure strategies s∗ and t∗ that satisfy the theorem.

311 Strategies on game trees

To construct a pure strategy for Max: 1. At each node where it’s Max’s move, choose one branch. 2. At each node where it’s Min’s move, include all branches. Let b be the tree’s branching factor (max. number of children of any node). Let m be the tee’s height (max. depth of any node). m The number of pure strategies for Max is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children).

312 Strategies on game trees

To construct a pure strategy for Min: 1. At each node where it’s Min’s move, choose one branch. 2. At each node where it’s Max’s move, include all branches. m The number of pure strategies for Min is bounded by b 2 . m d e (= bd 2 e if every node of height < m has b children).

313 Finding the best strategy

The brute-force algorithm to find Max’s and Min’s best strategies: I Construct the sets S and T of all Max’s and Min’s pure strategies, then choose:

s∗ = arg max min u(s, t) s S t T ∈ ∈ t∗ = arg min max u(s, t) t T s S ∈ ∈ Complexity analysis: m m m I Constructs and stores O(b 2 + b 2 ) = O(b 2 ) strategies. m I Each strategy is a tree that has O(b 2 ) nodes. m m m I Thus, space complexity is O(b 2 b 2 ) = O(b ). I Time complexity is slightly worse. But there is an easier way to find best strategies.

314 Backward induction For two-person perfect-information zero-sum games Algorithm function backward-induction(h) returns u(h) if h Z then return u(h) ∈ if ρ(h) = Max then return maxa A (backward-induction(σ(h, a)) ∈ else return mina A (backward-induction(σ(h, a)) ∈ To get action to be performed at h, return the action (arg max or arg min). Complexity analysis: I Space complexity: O(bm). I Time complexity: O(bm). For example, Chess: I b 35 and m 100 (“reasonable” games); ≈ ≈ I 35100 10150 nodes; ≈ I number of particles in the universe: 1087; ≈ I Impossible to explore them all! 315 Minimax algorithm

Modified version of the backward induction: I ` is an upper bound on the search depth.

I e(h) is a static evaluation function that returns an estimate of u∗(h). I Whenever we reach a non-terminal node h of depth `, return e(h). If ` = , then function e is never called, and Minimax returns u (h). ∞ ∗ Algorithm (Minimax – Shannon, 1950) function minimax(h, `) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then return maxa A (minimax(σ(h, a), ` 1)) ∈ − else return mina A (minimax(σ(h, a), ` 1)) ∈ −

316 Evaluation functions

e(h) is often a weighted sum of features. E.g., in chess:

1(#white pawns #black pawns) + 3(#white knights #black nights) + ... − − The exact values of e(h) does not matter. The behaviour of the algorithm is preserved under any monotonic transformation of e. Only the order matters: payoff acts as an ordinal utility.

317 Prunning

Backward-induction and minimax both look at nodes that do not need to be examined.

Max a 3 ≥

Min b 3 f 2 h 14 5 2 ≤ ≤ ≤ ≤

c d e g ? ? i j k 3 12 8 2 14 5 2

Max never goes to node f, because Max gets a higher utility by going to node b. Since f is worse than b, it cannot affect minimax value of node a. Does not know whether h is better or worse than a. Still do not know whether h is better or worse than a. h is worse than a. 318 Alpha-beta algorithm

Algorithm function alpha-beta(h, `, α, β) returns an estimate of u(h) if h Z then return u(h) ∈ if ` = 0 then return e(h) if ρ(h) = Max then v ← −∞ forall a χ(h) do ∈ v max(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v β then return v ≥ α max(α, v) ← else v ← ∞ forall a χ(h) do ∈ v min(v, alpha-beta(σ(h, a), ` 1, α, β) ← − if v α then return v ≤ β min(β, v) ← return v

319 Example Max a (7, ,) ) −∞∞∞ 87

Min b c (7, 8)) ∞ 7 8

Max d (7, ) m ∞ 85 9

Min e (7, ) j (7, 8)) ∞ ∞ 5 α-cutoff 8

Max f (7, ) i k (7, ) l (7, 8) ∞ ∞ 5 8 9 β-cutoff

g h 5 3 0 8 9 − 320 Properties of Alpha-beta prunning

Alpha-beta is a simple example of reasoning about which computations are relevant (a form of metareasoning). if α minimax(h, `) β, then alpha-beta(h, `, α, β) returns minimax(h, `). ≤ ≤ if minimax(h, `) α, then alpha-beta(h, `, α, β) returns a value α. ≤ ≤ if minimax(h, `) β, then alpha-beta(h, `, α, β) returns a value β. ≥ ≥ Consequently: I If α = and β = , then alpha-beta(h, `, α, β) returns minimax(h, `) −∞ ∞ I If α = , β = , and ` = , then alpha-beta(h, `, α, β) returns u (u). −∞ ∞ ∞ ∗

321 Properties of Alpha-beta prunning

Good move ordering can enable prunning more nodes. The best case is if: I at nodes where it’s Max’s move, children are largest-value first I at nodes where it’s Min’s move, children are smallest-value first h In this case, Alpha-beta’s time complexity is O(b 2 ), which doubles the solvable depth. Worst case is the reverse: I at nodes where it’s Max’s move, children are smallest-value first I at nodes where it’s Min’s move, children are largest-value first In this case, Alpha-beta will visit every node of depth `. ≤ Hence time complexity is the same as minimax: O(bm)

322 Discussion

Deeper lookahead (i.e., larger depth bound `) usually gives better decisions. Exceptions do exist: I “Pathological” games in which deeper lookahead gives worse decisions. I But such games are rare. Suppose we have 100 seconds, and we can explore 104 nodes/second: 6 8 I Thus, we can explore 10 35 2 nodes per move. ≈ I Alpha-beta reaches depth 8. I Pretty good chess program. Some modifications that can improve further the accuracy or computation time: I node ordering (see next slide) I quiescence search I biasing I transposition tables I thinking on the opponent’s time

323 Node ordering

Recall that we have the best case if: I at nodes where it’s Max’s move, children are largest first I at nodes where it’s Min’s move, children are smallest first h In the best case time complexity is O(b 2 ), which doubles the solvable depth. How to get closer to the best case: I Every time you expand a state, apply function e to its children. I When it’s Max’s move, sort the children with largest e first. I When it’s Min’s move, sort the children with smallest e first.

324 Quiescence search and biasing

In a game like checkers or chess, where the evaluation is based greatly on material pieces, the evaluation is likely to be inaccurate if there are pending captures. Search deeper to reach a position where there aren’t pending captures. Evaluations will be more accurate here. But that creates another problem: I You’re searching some paths to an even depth, others to an odd depth. I Paths that end just after your opponent’s move will look worse than paths that end just after your move. I To compensate, add or subtract a number called the “biasing factor”.

325 Transposition tables

Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree). Idea: I When you compute h’s minimax value, store it in a hash table. I If you visit h again, retrieve its value rather than computing it again. The hash table is called a transposition table. Problem: far too many states to store all of them. In this case: I Store some of the states, rather than all of them. I Try to store the ones that you’re most likely to need.

326 Thinking on the opponent’s time

The current state is a. Use Alpha-beta to estimate the minimax values of b and c. Move to the larges, c. Consider your estimates for f and g. Your opponent is likely to move to to f because its value is smaller. Do a minimax search below f while waiting for the opponent’s move. If the opponent moves to f, then you have already done a lot of work to figure out your next move.

327 Game-tree search in practice

Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Checkers was solved in April 2007: from the standard starting position, both players can guarantee a draw with perfect play. This took 1014 calculations over 18 years. Checkers has a search space of size 5 1020. × Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: human champions refuse to compete against computers, who are too good. Go: until recently, human champions did not compete against computers because the computers were too bad. But that has changed . . .

328 Game-tree search in Go

A ’s size grows exponentially with both its depth and its branching factor. Go is huge: I branching factor 200; ≈ I game length 250 to 300 moves; ≈ I number of paths in the game tree 10525 to 10620. ≈ Much too big for a normal game tree search. Comparison: I number of atoms in universe: 1080; ≈ I number of particles in universe: 1087. ≈

329 Game-tree search in Go

Some years ago, go programs have gotten much better. Main reason: Monte Carlo roll-outs. Basic idea: do a minimax search of a randomly selected subtree. At each node that the algorithm visits: I it randomly selects some of the children (there are heuristics for deciding how many). I Calls itself recursively on these, ignores the others.

330 Forward prunning in Chess

Back in the 1970s, some similar ideas were tried in chess. The approach was called forward pruning. Main difference: select the children heuristically rather than randomly. It did not work as well as brute-force alpha-beta, so people abandoned it.

331 Summary

If a game is two-player zero-sum, then maximin and minimax are the same. If the game also is perfect-information, only need to look at pure strategies. If the game also is sequential, deterministic, and finite, then we can do a game-tree search (minimax, alpha-beta pruning). In sufficiently complicated games, perfection is unattainable. I We must approximate: limited search depth, static evaluation function. In games that are even more complicated, further approximation is needed: I Monte Carlo roll-outs.

332