CSC384: Intro to Generalizing Search Problems

● So far: our search problems have assumed agent has Game Tree Search complete control of environment ■ state does not change unless the agent (robot) changes it. All we need to compute is a single path to a goal state.

■ Chapter 6.1, 6.2, 6.3, 6.6 cover some of the material we cover here. Section 6.6 has an interesting ● Assumption not always reasonable overview of State-of-the-Art game playing programs. ■ stochastic environment (e.g., the weather, traffic accidents). ■ other agents whose interests conflict with yours ■ Problem: you might not traverse the path you are expecting. ■ Section 6.5 extends the ideas to games with uncertainty (We won’t cover that material but it makes for interesting reading).

Sheila McIlraith, CSC384, Winter 2010, University of 1 Sheila McIlraith, CSC384, Winter 2010, University of 2 Toronto Toronto

Generalizing Search Problems Two-person Zero-Sum Games

Two-person, zero-sum games ● In these cases, we need to generalize our ● view of search to handle state changes ■ , checkers, tic-tac-toe, backgammon, go, Doom, that are not in the control of the agent. “find the last parking space” ■ Your winning means that your opponent looses, and ● One generalization yields game tree search vice-versa. agent and some other agents. ■ ■ Zero-sum means the sum of your and your opponent’s ■ The other agents are acting to maximize their payoff is zero---any thing you gain come at your profits opponent’s cost (and vice-versa). Key insight: this might not have a positive effect on your how you act depends on how the other agent acts profits. (or how you think they will act) and vice versa (if your opponent is a rational player)

Sheila McIlraith, CSC384, Winter 2010, University of 3 Sheila McIlraith, CSC384, Winter 2010, University of 4 Toronto Toronto

1 More General Games More General Games

●What makes something a game? ● What makes games hard? ■ there are two (or more) agents influencing state ■ how you should play depends on how you think change the other person will play; but how they play ■ each agent has their own interests depends on how they think you will play; so how e.g., goal states are different; or we assign you should play depends on how you think they different values to different paths/states think you will play; but how they play should depend on how they think you think they think you ■ Each agent tries to alter the state so as to best will play; … benefit itself.

Sheila McIlraith, CSC384, Winter 2010, University of 5 Sheila McIlraith, CSC384, Winter 2010, University of 6 Toronto Toronto

More General Games Game 1: Rock, Paper Scissors

●Zero-sum games are “fully competitive” ●Scissors cut paper, ■ if one player wins, the other player loses paper covers rock, rock Player II ■ e.g., the amount of money I win (lose) at poker is the smashes scissors RPS amount of money you lose (win) Represented as a ● R 0/0 -1/1 1/-1 ●More general games can be “cooperative” matrix: Player I chooses ■ some outcomes are preferred by both of us, or at a row, Player II chooses least our values aren’t diametrically opposed P 1/-1 0/0 -1/1

a column Player I ●We’ll look in detail at zero-sum games ●Payoff to each player S -1/1 1/-1 0/0 ■ but first, some examples of simple zero-sum and in each cell (P.I / P.II) cooperative games ●1: win, 0: tie, -1: loss ■ so it’s zero-sum

Sheila McIlraith, CSC384, Winter 2010, University of 7 Sheila McIlraith, CSC384, Winter 2010, University of 8 Toronto Toronto

2 Game 2: Prisoner’s Dilemma Game 3: Battlebots

●Two prisoner’s in separate cells, DA doesn’t ●Two robots: Blue (Steven’s), Red (Michael’s) have enough evidence to convict them ■ one cup of coffee, one tea left ●If one confesses, other doesn’t: Coop Def ■ both S & M prefer coffee (value 10) CT confessor goes free ■ tea acceptable (value 8) ■ C Coop 3/3 0/4 0/0 10/8 ■ other sentenced to 4 years ●Both robots go for Coffee ●If both confess (both defect) ■ collide and get no payoff Def 4/0 1/1 T 8/10 0/0 ■ both sentenced to 3 years ●Both go for tea: same ●Neither confess (both cooperate) ●One goes for coffee, other for tea: ■ sentenced to 1 year on minor charge ■ coffee robot gets 10 ●Payoff: 4 minus sentence ■ tea robot gets 8

Sheila McIlraith, CSC384, Winter 2010, University of 9 Sheila McIlraith, CSC384, Winter 2010, University of 10 Toronto Toronto

Two Player Zero Sum Games Two-Player, Zero-Sum Game: Definition

●Key point of previous games: what you should ●Two players A (Max) and B (Min) do depends on what other guy does ●set of positions P (states of the game) ●Previous games are simple “one shot” games ●a starting position s ∊ P (where game begins) ■ single move each ●terminal positions T ⊆ P (where game can end) in game theory: strategic or normal form games ■ ●set of directed edges EA between states (A’s ●Many games extend over multiple moves moves) ■ e.g., chess, checkers, etc. ●set of directed edges EB between states (B’s ■ in game theory: extensive form games moves) ●We’ll focus on the extensive form ●utility or payoff function U : T → ℝ (how good is ■ that’s where the computational questions emerge each terminal state for player A) ■ why don’t we need a utility function for B?

Sheila McIlraith, CSC384, Winter 2010, University of 11 Sheila McIlraith, CSC384, Winter 2010, University of 12 Toronto Toronto

3 Intuitions Tic-tac-toe: States

Turn=Max(X) Turn=Min(O) Turn=Max(X) ●Players alternate moves (starting with Max) ■ Game ends when some terminal p ∊T is reached XXX A game state: a position-player pair Æ Æ Æ Æ O ● rt tells us what position we’re in, whose move it is ta ■ s O ●Utility function and terminals replace goals ■ Max wants to maximize the terminal payoff Min(O) Max(X) Min wants to minimize the terminal payoff ■ X X X O X X ●Think of it as: r l O he O X na ot al ■ Max gets U(t), Min gets –U(t) for terminal node t mi an in er rm ■ This is why it’s called zero (or constant) sum t O te O U = +1 U = -1 Sheila McIlraith, CSC384, Winter 2010, University of 13 Sheila McIlraith, CSC384, Winter 2010, University of 14 Toronto Toronto

Tic-tac-toe: Game Tree Game Tree

Max ●Game tree looks like a search tree ■ Layers reflect the alternating moves

X X X ●But Max doesn’t decide where to go alone Min a ■ after Max moves to state a, Mins decides whether X to move to state b, c, or d ●Thus Max must have a X O X O X ■ must know what to do next no matter what move Max O Min makes (b, c, or d) Min bc d ■ a sequence of moves will not suffice: Max may want to do something different in response to b, c, or d X O O X O ●What is a reasonable strategy? X X U = +1 Sheila McIlraith, CSC384, Winter 2010, University of 15 Sheila McIlraith, CSC384, Winter 2010, University of 16 Toronto Toronto

4 Strategy: Intuitions Minimax Strategy: Intuitions

s0 s0 max node max node

min node min node terminal terminal s1 s2 s3 s1 s2 s3

t1 t2 t3 t4 t5 t6 t7 t1 t2 t3 t4 t5 t6 t7 7 -6 4 3 9 -10 2 7 -6 4 3 9 -10 2 If Max goes to s1, Min goes to t2 So Max goes to s2: so * U(s1) = min{U(t1), U(t2), U(t3)} = -6 U(s0) The terminal nodes have utilities. If Max goes to s2, Min goes to t4 = max{U(s1), U(s2), U(s3)} But we can compute a “utility” for the non-terminal * U(s2) = min{U(t4), U(t5)} = 3 = 3 states, by assuming both players always play their If Max goes to s3, Min goes to t6 best move. * U(s3) = min{U(t6), U(t7)} = -10

Sheila McIlraith, CSC384, Winter 2010, University of 17 Sheila McIlraith, CSC384, Winter 2010, University of 18 Toronto Toronto

Minimax Strategy Minimax Strategy

●Build full game tree (all leaves are terminals) ● The values labeling each state are the values ■ root is start state, edges are possible moves, etc. that Max will achieve in that state if both he ■ label terminal nodes with utilities and Min play their best moves. Max plays a move to change the state to the ●Back values up the tree ■ highest valued min child. ■ U(t) is defined for all terminals (part of input) Min plays a move to change the state to the lowest U(n) = min {U(c) : c a child of n} if n is a min node ■ ■ valued max child. ■ U(n) = max {U(c) : c a child of n} if n is a max node ● If Min plays poorly, Max could do better, but never worse. ■ If Max, however know that Min will play poorly, there might be a better strategy of play for Max than minimax!

Sheila McIlraith, CSC384, Winter 2010, University of 19 Sheila McIlraith, CSC384, Winter 2010, University of 20 Toronto Toronto

5 Depth-first Implementation of MinMax Depth-first Implementation of MinMax

utility(N,U) :- terminal(N), val(N,U). utilityList([],[]). utility(N,U) :- maxMove(N), children(N,CList), utilityList([N|R],[U|UList]) utilityList(CList,UList), :- utility(N,U), utilityList(R,UList). max(UList,U). utility(N,U) :- minMove(N), children(N,CList), utilityList(CList,UList), ■ utilityList simply computes a list of utilities, one for each min(UList,U). node on the list. ●Depth-first evaluation of game tree ■ The way prolog executes implies that this will ■ terminal(N) holds if the state (node) is a terminal compute utilities using a depth-first post-order node. Similarly for maxMove(N) (Max player’s move) traversal of the game tree. and minMove(N) (Min player’s move). post-order (visit children before visiting parents). ■ utility of terminals is specified as part of the input (val)

Sheila McIlraith, CSC384, Winter 2010, University of 21 Sheila McIlraith, CSC384, Winter 2010, University of 22 Toronto Toronto

Depth-first Implementation of MinMax Visualization of DF-MinMax Once s17 evaluated, no need to store tree: s16 only needs its value. ●Notice that the game tree has to have finite s0 Once s24 value computed, we can depth for this to work evaluate s16 ●Advantage of DF implementation: space efficient s1 s13 s16

s2 s6 t14 t15 s17 s24

s18 s21 t3 t4 t5 s7 s10 t25 t26

t8 t9 t11 t12 t19 t20 t22 t23

Sheila McIlraith, CSC384, Winter 2010, University of 23 Sheila McIlraith, CSC384, Winter 2010, University of 24 Toronto Toronto

6 Pruning Cutting Max Nodes (Alpha Cuts)

●It is not necessary to examine entire tree to ● At a Max node n: make correct minimax decision ■ Let β be the lowest value of n’s siblings examined so far, i.e. siblings to the left of n that have already been searched. ●Assume depth-first generation of tree (note β is fixed when evaluating n) After generating value for only some of n’s children ■ Let α be the highest value of n’s children examined so far ■ (note changes as children of n examined) we may find n is never reached in a MinMax strategy! α s0 ■ No need to generate/evaluate more children of n ! max node Two types of pruning (cuts): ● s1 s13 s16 min node ■ pruning of max nodes (α-cuts) terminal α is the best value for MAX find so far s2 s6 β =5 only one sibling value known ■ pruning of min nodes (β-cuts) 5 α =8 α=10 α=10 β is the best value for MIN find so far T3 T4 T5 sequence of values for α as s6’s 8 10 5 children are explored Sheila McIlraith, CSC384, Winter 2010, University of 25 Sheila McIlraith, CSC384, Winter 2010, University of 26 Toronto Toronto

Cutting Max Nodes (Alpha Cuts) Cutting Min Nodes (Beta Cuts) If becomes we can stop expanding ● α ≥β ● At a Min node n: the children of n ■ Let β be the lowest value of n’s children examined ■ Min will never choose to move from n’s parent to so far (changes as children of n are examined) n since it would choose one of n’s lower valued ■ Let α be the highest value of n’s sibling’s siblings first. min node examined so far (fixed when evaluating n) s0 P β = 8 max node s1 s13 s16 min node terminal 14 12 8 n α = 2 4 9 α =10 s2 s6

β =5 β =3 s1 s2 s3 249

Sheila McIlraith, CSC384, Winter 2010, University of 27 Sheila McIlraith, CSC384, Winter 2010, University of 28 Toronto Toronto

7 Cutting Min Nodes (Beta Cuts) Alpha-Beta Algorithm

●If β becomes ≤αwe can stop expanding the children of n. MaxEval(node, alpha, beta): Pseudo-code that associates If terminal(node), return U(n) ■ Max will never choose to move from n’s parent to a value with each node. For each c in childlist(n) n since it would choose one of n’s higher value Strategy extracted by val ← MinEval(c, alpha, beta) siblings first. moving to Max node (if you alpha ← max(alpha, val) are player Max) at each If alpha ≥ beta, return alpha step. P alpha = 7 Return alpha MinEval(node, alpha, beta): Evaluate(startNode): If terminal(node), return U(n) /* assume Max moves first */ 6 2 7 n beta = 9 8 3 For each c in childlist(n) MaxEval(start, -infnty, +infnty) val ← MaxEval(c, alpha, beta) beta ← min(beta, val) If alpha ≥ beta, return beta s1 s2 s3 Return beta 983

Sheila McIlraith, CSC384, Winter 2010, University of 29 Sheila McIlraith, CSC384, Winter 2010, University of 30 Toronto Toronto

Rational Opponents Rational Opponents

●This all assumes that your opponent is rational ●Storing your strategy is a potential issue: ■ e.g., will choose moves that minimize your score ■ you must store “decisions” for each node you can reach by playing optimally if your opponent has unique rational choices, this ●What if your opponent doesn’t play rationally? ■ is a single branch through game tree ■ will it affect quality of ? ■ if there are “ties”, opponent could choose any one of the “tied” moves: must store strategy for each subtree ●What if your opponent doesn’t play rationally? Will your stored strategy still work?

Sheila McIlraith, CSC384, Winter 2010, University of 31 Sheila McIlraith, CSC384, Winter 2010, University of 32 Toronto Toronto

8 Practical Matters Practical Matters

●Depth-first expansion almost always used for ●The efficiency of alpha-beta pruning depends game trees because of sheer size of trees on the order in which children are evaluated: ●We must limit depth of search tree ■ On average half of the children are pruned at each node (so the branch factor is halved!) ■can’t expand all the way to terminal nodes d/2 must make heuristic estimates about the values of ■ O(b ) so we can search twice as deep! ■ the (nonterminal) states at the leaves of the tree evaluation function is an often used term All “real” games are too large to expand the ■ ● evaluation functions are often learned whole tree ■

s0 ■ e.g., chess branching factor is roughly 35 ■ Depth 10 tree: 2,700,000,000,000,000 nodes s1 s13 s16

s2 s17 s24 ■ Even alpha-beta pruning won’t help here! s6 t14 t15

s7 s7 s10 s7 s18 s21 t25 Cut after 3 moves t4

Sheila McIlraith, CSC384, Winter 2010, University of 33 Sheila McIlraith, CSC384, Winter 2010, University of 34 Toronto Toronto

Heuristics Some Interesting Games

●Think of a few games and suggest some ●Tesauro’s TD-Gammon heuristics for estimating the “goodness” of a ■ champion backgammon player which learned position evaluation function; stochastic component (dice) ■ chess? ●Checker’s (Samuel, 1950s; Chinook 1990s ■ checkers? Schaeffer) ■ your favorite video game? ●Chess (which you all know about) “find the last parking spot”? ■ ●Bridge, Poker, etc. ●Check out Jonathan Schaeffer’s Web page: ■ www.cs.ualberta.ca/~games ■ they’ve studied lots of games (you can play too)

Sheila McIlraith, CSC384, Winter 2010, University of 35 Sheila McIlraith, CSC384, Winter 2010, University of 36 Toronto Toronto

9 An Aside on Large Search Problems Realtime Search Graphically

●Issue: inability to expand tree to terminal nodes is relevant even in standard search 1. We run A* (or our favorite ) until we are forced to make a move or run out ■ often we can’t expect A* to reach a goal by of memory. Note: no leaves are goals yet. expanding full frontier ■ so we often limit our lookahead, and make moves 2. We use evaluation function f(n) to decide which before we actually know the true path to the goal path looks best (let’s say it is the red one).

■ sometimes called online or realtime search 3. We take the first step along the best path ●In this case, we use the heuristic function not (red), by actually making that move. just to guide our search, but also to commits to 4. We restart search at the node we reach by moves we actually make making that move. (We may actually cache the ■ in general, guarantees of optimality are lost, but we results of the relevant part of first search reduce computational/memory expense tree if it’s hanging around, as it would with A*). dramatically

Sheila McIlraith, CSC384, Winter 2010, University of 37 Sheila McIlraith, CSC384, Winter 2010, University of 38 Toronto Toronto

10