Game Tree Search Complete Control of Environment ■ State Does Not Change Unless the Agent (Robot) Changes It
Total Page:16
File Type:pdf, Size:1020Kb
CSC384: Intro to Artificial Intelligence Generalizing Search Problems ● So far: our search problems have assumed agent has Game Tree Search complete control of environment ■ state does not change unless the agent (robot) changes it. All we need to compute is a single path to a goal state. ■ Chapter 6.1, 6.2, 6.3, 6.6 cover some of the material we cover here. Section 6.6 has an interesting ● Assumption not always reasonable overview of State-of-the-Art game playing programs. ■ stochastic environment (e.g., the weather, traffic accidents). ■ other agents whose interests conflict with yours ■ Problem: you might not traverse the path you are expecting. ■ Section 6.5 extends the ideas to games with uncertainty (We won’t cover that material but it makes for interesting reading). Hojjat Ghaderi, University of Toronto 1 Hojjat Ghaderi, University of Toronto 2 Generalizing Search Problems Two-person Zero-Sum Games ● In these cases, we need to generalize our ●Two-person, zero-sum games view of search to handle state changes ■ chess, checkers, tic-tac-toe, backgammon, go, Doom, that are not in the control of the agent. “find the last parking space” ■ Your winning means that your opponent looses, and ● One generalization yields game tree search vice-versa. ■ agent and some other agents. ■ Zero-sum means the sum of your and your opponent’s ■ The other agents are acting to maximize their payoff is zero---any thing you gain come at your profits opponent’s cost (and vice-versa). Key insight: this might not have a positive effect on your how you act depends on how the other agent acts profits. (or how you think they will act) and vice versa (if your opponent is a rational player) Hojjat Ghaderi, University of Toronto 3 Hojjat Ghaderi, University of Toronto 4 More General Games More General Games ●What makes something a game? ● What makes games hard? ■ there are two (or more) agents influencing state ■ how you should play depends on how you think change the other person will play; but how they play ■ each agent has their own interests depends on how they think you will play; so how e.g., goal states are different; or we assign you should play depends on how you think they different values to different paths/states think you will play; but how they play should depend on how they think you think they think you ■ Each agent tries to alter the state so as to best will play; … benefit itself. Hojjat Ghaderi, University of Toronto 5 Hojjat Ghaderi, University of Toronto 6 More General Games Game 1: Rock, Paper Scissors ●Zero-sum games are “fully competitive” ●Scissors cut paper, ■ if one player wins, the other player loses paper covers rock, rock Player II ■ e.g., the amount of money I win (lose) at poker is the smashes scissors RPS amount of money you lose (win) ●Represented as a R 0/0 -1/1 1/-1 ●More general games can be “cooperative” matrix: Player I chooses ■ some outcomes are preferred by both of us, or at a row, Player II chooses least our values aren’t diametrically opposed P 1/-1 0/0 -1/1 a column I Player ● We’ll look in detail at zero-sum games ● Payoff to each player S -1/1 1/-1 0/0 ■ but first, some examples of simple zero-sum and in each cell ( P.I / P.II ) cooperative games ●1: win, 0: tie, -1: loss ■ so it’s zero-sum Hojjat Ghaderi, University of Toronto 7 Hojjat Ghaderi, University of Toronto 8 Game 2: Prisoner’s Dilemma Game 3: Battlebots ●Two prisoner’s in separate cells, DA doesn’t ●Two robots: Blue (Craig’s), Red (Fahiem’s) have enough evidence to convict them ■ one cup of coffee, one tea left ●If one confesses, other doesn’t: Coop Def ■ both C & F prefer coffee (value 10) CT ■ confessor goes free ■ tea acceptable (value 8) C 0/0 10/8 ■ other sentenced to 4 years Coop 3/3 0/4 ●Both robots go for Coffee ●If both confess (both defect) ■ collide and get no payoff Def 4/0 1/1 T 8/10 0/0 ■ both sentenced to 3 years ●Both go for tea: same ●Neither confess (both cooperate) ●One goes for coffee, other for tea: ■ sentenced to 1 year on minor charge ■ coffee robot gets 10 ●Payoff: 4 minus sentence ■ tea robot gets 8 Hojjat Ghaderi, University of Toronto 9 Hojjat Ghaderi, University of Toronto 10 Two Player Zero Sum Games Two-Player, Zero-Sum Game: Definition ●Key point of previous games: what you should ●Two players A (Max) and B (Min) do depends on what other guy does ●set of positions P (states of the game) ●Previous games are simple “one shot” games ●a starting position s P (where game begins) ■ single move each ●terminal positions T ⊆⊆⊆ P (where game can end) ■ in game theory: strategic or normal form games ●set of directed edges E A between states (A’s ●Many games extend over multiple moves moves ) ■ e.g., chess, checkers, etc. ●set of directed edges E B between states (B’s ■ in game theory: extensive form games moves ) ●We’ll focus on the extensive form ●utility or payoff function U : T → (how good is ■ that’s where the computational questions emerge each terminal state for player A) ■ why don’t we need a utility function for B? Hojjat Ghaderi, University of Toronto 11 Hojjat Ghaderi, University of Toronto 12 Intuitions Tic-tac-toe: States ●Players alternate moves (starting with Max) Turn=Max(X) Turn=Min(O) Turn=Max(X) ■ Game ends when some terminal p T is reached XX X ●A game state : a position-player pair t O ar ■ tells us what position we’re in, whose move it is t s O ●Utility function and terminals replace goals ■ Max wants to maximize the terminal payoff Min(O) Max(X) ■ Min wants to minimize the terminal payoff X X X O X X ●Think of it as: r l O he O X na ot al ■ Max gets U(t), Min gets –U(t) for terminal node t mi an in er rm ■ This is why it’s called zero (or constant) sum t O te O U = +1 U = -1 Hojjat Ghaderi, University of Toronto 13 Hojjat Ghaderi, University of Toronto 14 Tic-tac-toe: Game Tree Game Tree Max ●Game tree looks like a search tree ■ Layers reflect the alternating moves X X X ●But Max doesn’t decide where to go alone Min a ■ after Max moves to state a, Mins decides whether X to move to state b, c, or d ●Thus Max must have a strategy X O X O X ■ must know what to do next no matter what move Max O Min makes ( b, c, or d) Min b c d ■ a sequence of moves will not suffice: Max may want to do something different in response to b, c, or d X O O X O ●What is a reasonable strategy? X X U = +1 Hojjat Ghaderi, University of Toronto 15 Hojjat Ghaderi, University of Toronto 16 Minimax Strategy: Intuitions Minimax Strategy: Intuitions s0 s0 max node max node min node min node s1 s2 s3 terminal s1 s2 s3 terminal t1 t2 t3 t4 t5 t6 t7 t1 t2 t3 t4 t5 t6 t7 7 -6 4 3 9 -10 2 7 -6 4 3 9 -10 2 If Max goes to s1, Min goes to t2 So Max goes to s2: so * U(s1) = min{U(t1), U(t2), U(t3)} = -6 U(s0) The terminalnodes have utilities. If Max goes to s2, Min goes to t4 = max{U(s1), U(s2), U(s3)} But we can compute a “utility” for the non-terminal * U(s2) = min{U(t4), U(t5)} = 3 = 3 states, by assuming both players always play their If Max goes to s3, Min goes to t6 best move. * U(s3) = min{U(t6), U(t7)} = -10 Hojjat Ghaderi, University of Toronto 17 Hojjat Ghaderi, University of Toronto 18 Minimax Strategy Minimax Strategy ●Build full game tree (all leaves are terminals) ● The values labeling each state are the values ■ root is start state, edges are possible moves, etc. that Max will achieve in that state if both he ■ label terminal nodes with utilities and Min play their best moves. ●Back values up the tree ■ Max plays a move to change the state to the highest valued min child. ■ U(t) is defined for all terminals (part of input) ■ Min plays a move to change the state to the lowest ■ U(n) = min { U(c) : c a child of n} if n is a min node valued max child. ■ U(n) = max { U(c) : c a child of n} if n is a max node ● If Min plays poorly, Max could do better, but never worse. ■ If Max, however know that Min will play poorly, there might be a better strategy of play for Max than minimax! Hojjat Ghaderi, University of Toronto 19 Hojjat Ghaderi, University of Toronto 20 Depth-first Implementation of MinMax Depth-first Implementation of MinMax utility(N,U) :- terminal(N), utility(N,U). utilityList([],[]). utility(N,U) :- maxMove(N), children(N,CList), utilityList([N|R],[U|UList]) utilityList(CList,UList), :- utility(N,U), utilityList(R,UList). max(UList,U). utility(N,U) :- minMove(N), children(N,CList), utilityList(CList,UList), ■ utilityList simply computes a list of utilities, one for each min(UList,U). node on the list. ●Depth-first evaluation of game tree ■ The way prolog executes implies that this will ■ terminal(N) holds if the state (node) is a terminal compute utilities using a depth-first post-order node.