Game Tree Search
Total Page:16
File Type:pdf, Size:1020Kb
CS 7180: Behavioral Modeling and Decision-making in AI Game Tree Search Prof. Amy Sliva September 28, 2012 Outline • Introduction to games • State-space representation of game tree • Adversarial search • Minimax search • Alpha-beta pruning Decisions in mulagent environments • Automated planning from last time assumes single agent • Decisions depend only on the state-space, possible actions, and the desired goal • Multiagent domains—strategic decisions are necessary • Consider actions of other agents when making a decision • Competitive multiagent environments often called games • Goals of agents are in conlict • Often some cooperation involved, but we will get to this later… • Formulate decision making as a state-space search problem • Adversarial search What do we mean by “games”? Deterministic Chance Chess, Checkers, Backgammon, Perfect Information Go, Othello Monopoly Bridge, Poker, Battleship, Imperfect Information Kriegspiel Scrabble, Nuclear war • Not typically considered—physical games like tennis, croquet, ice hockey, etc. • Well, sometimes… Robot Soccer! • RoboCup http://www.robocup.org/ • The goal: “By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the of<icial rule of the FIFA, against the winner of the most recent World Cup.” • We’ll stick with games like chess and checkers today… Common subset of games for AI Deterministic Chance Chess, Checkers, Backgammon, Perfect Information Go, Othello Monopoly Bridge, Poker, Battleship, Imperfect Information Kriegspiel Scrabble, Nuclear war • Deterministic, turn-taking, two-player, zero-sum, perfect- information Determinisc, turn-taking, zero-sum games • Deterministic • No chance (e.g., using dice) involved • Turn taking • Two agents whose actions alternate • Zero-sum • Utility values for each agent are equal and opposite of the other (i.e., if one agent wins, the other necessarily loses) • Creates the adversarial situation • Perfect information • Fully observable environments • No information is hidden from either player • Generalizes to stochastic games, multiple players, non zero-sum, etc. Why study these games in AI? • Fun! • Clear criteria for success • Interesting hard problems • Very large search spaces (i.e., chess has average 10134 nodes!) • Problems involving hostile, adversarial, or competing agents • Insight into other real-world applications with strategic decisions • Like the real world, games require a decision be made, even if calculating optimal decision is impossible • Different from games studied in game-theory How to play a game 1. Consider all legal moves—each move leads to a new state 2. Evaluate the possible next states and determine the best 3. Make your move! 4. Wait for opponent to move and do it all again • Represent the game as a tree, and this becomes a search problem Game trees • Root node is the current state • Looking for the best single move to make next • Evaluation function f(s) rates each possible state • Edges are the legal moves for a player • Terminal nodes represent end-game states (i.e., win, lose, or draw) Game trees • Root node is the current state • Looking for the best single move to make next • Evaluation function f(s) rates each possible state • Edges are the legal moves for a player • Terminal nodes represent end-game states (i.e., win, lose, or draw) How do we search this tree to ind the optimal move? Planning vs. adversarial search • Planning—no adversary • Solution is (heuristic) a method for Jinding goal (i.e., a plan) • Heuristic techniques can ind optimal solution • Evaluation function—estimate of cost from start to goal through given node • Games—mutliagent and adversarial • Solution is a strategy • Strategy speciJies our move for every possible opponent reply • Time limits force an approximate solution • Evaluation function—evaluate “goodness” or utility of game position Formulang games as search • Two players: MAX and MIN—MAX moves Jirst • Take turns until game is over • Winner gets reward, loser gets penalty • Zero sum—sum of the reward and the penalty is a constant • Formal deJinition • Initial state: Set-up speciJied by the rules, e.g., initial board conJiguration of chess • Player(s): DeJines which player has the move in a state • Actions(s): Returns the set of legal moves in a state • Result(s,a): Transition model deJines the result of a move • Terminal-Test(s): Is the game Jinished? True if Jinished, false otherwise • Utility function(s,p): Gives numerical value of terminal state s for player p • E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe. • E.g., win (+1), lose (0), and draw (1/2) in chess. • MAX uses search tree to determine next move, assuming MIN is playing optimally Game trees for MAX and MIN • Current player searches from perspective of MAX • If it is our turn to move, then the root is a MAX node • Each level of the tree has nodes that are all MAX or all MIN • Alternate MAX and MIN level • Complete game tree • All conJigurations generated by legal moves from root to the end • Incomplete game tree • All conJigurations generated by legal moves from root to a given depth (look ahead some number of steps) Evaluang a game state • Evaluation function f(n) evaluates the "goodness" of a game position (i.e., the utility to each player) • Contrast with heuristic search—evaluation function estimates cost/ distance to the goal • Zero-sum assumption allows us to use one evaluation function for both players • f(n) > 0: position n good for MAX and bad for MIN • f(n) < 0: position n bad for MAX and good for MIN • f(n) near 0: position n is a neutral position • f(n) >> 0: win for MAX! • f(n) << 0: win for MIN! • Goal of game tree search—determine one move for MAX that maximizes the payoff for a given game tree according to f(n) • Regardless of the moves the MIN will take Evaluaon funcon for Tic-Tac-Toe f(n) = +1 if the position is a win for X. f(n) = -1 if the position is a win for O. f(n) = 0 if the position is a draw. Minimax evaluaon rule • Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the maximum minimax value, and MIN a state with minimum • Assumes worst case scenario that MIN is also trying to get his highest utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) = Utility(s) if Terminal-Test(s) maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN Minimax evaluaon rule • Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) = Utility(s) if Terminal-Test(s) maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN Minimax evaluaon rule • Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) at a MAX node is maximum of Minimax(s) = the child nodes Utility(s) if Terminal-Test(s) maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN Minimax evaluaon rule • Minimax value of a node Minimax(n) = utility for MAX of being in the state at n, assuming optimal play from MIN • MAX wants a state with the Minimax(s) at a maximum minimax value, and MIN a state with terminal state is its minimum utility according to the • Assumes worst case scenario that MIN is also trying to get his highest rules of the game utility/payoff (i.e., is trying to minimize payoff to MAX) Minimax(s) at a MAX node is maximum of Minimax(s) = the child nodes Utility(s) if Terminal-Test(s) maxa ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MAX mina ∈ Actions(s) Minimax(Result(s,a)) if Player(s) = MIN Minimax(s) at a MIN node is minimum of the child nodes Minimax game tree MAX 3 A a1 a3 a2 MIN 322BCD b1 b3 c1 c3 d1 d3 b2 c2 d2 3 12 8 2 4 6 14 5 2 • Terminal nodes show the utility values for MAX • Other nodes are labeled with the minimax values • MAX’s best move is a1—leads to state with highest minimax • MIN’s best reply is b1—leads to state with lowest minimax Minimax algorithm for opmal play Find the optimal strategy for MAX: 1. Generate the whole game tree, down to the leaves 2. Apply utility (payoff) function to each leaf 3. Recursively back up values from leaves: a) a Max node computes the max of its child values b) a Min node computes the min of its child values 4. At root choose the move leading to the child with highest minimax value Minimax algorithm in acon MAX 3 A a1 a3 a2 MIN 322BCD b1 b3 c1 c3 d1 d3 b2 c2 d2 3 12 8 2 4 6 14 5 2 Minimax algorithm in acon MAX 3 A a1 a3 a2 MIN 322BCD b1 b3 c1 c3 d1 d3 b2 c2 d2 3 12 8 2 4 6 14 5 2 Minimax algorithm in acon MAX 3 A a1 a3 a2 MIN 322BCD b1 b3 c1 c3 d1 d3 b2 c2 d2 3 12 8 2 4 6 14 5 2 Minimax algorithm in acon MAX 3 A a1 a3 a2 MIN 322BCD b1 b3 c1 c3 d1 d3 b2 c2 d2 3 12 8 2 4 6 14 5 2 Minimax algorithm in acon MAX 3 A a1 a3