AAAI'08 Tutorial Players

General Game Playing

Michael Thielscher, Dresden

Some of the material presented in this tutorial originates in work by Michael Genesereth and the Stanford Logic Group. We greatly appreciate their contribution.

The Turk (18th Century) Alan Turing & Claude Shannon (~1950) Deep-Blue Beats World Champion (1997)

In the early days, game playing machines were considered a key to (AI).

But chess computers are highly specialized systems. Deep-Blue's intelligence was limited. It couldn't even play a decent game of Tic-Tac-Toe or Rock-Paper-Scissors.

With General Game Playing many of the original expectations with game playing machines get revived.

Traditional research on game playing focuses on constructing specific evaluation functions A General Game Player is a system that building libraries for specific games understands formal descriptions of arbitrary strategy games The intelligence lies with the programmer, not with the program! learns to play these games well without human intervention A General Game Player needs to exhibit much broader intelligence: abstract thinking strategic planning learning General Game Playing and AI

Rather than being concerned with a specialized solution to a narrow problem, General Game Playing encompasses a variety of AI areas:

Game Playing Games Agents Knowledge Representation Deterministic, complete information Competitive environments Planning and Search Nondeterministic, partially observable Uncertain environments Learning Rules partially unknown Unknown environment model Robotic player Real-world environments

General Game Playing is considered a grand AI Challenge

Application (1) Application (2): Economics

Commercially available chess computers can't be used for a game of Bughouse Chess.

A General Game Playing system could be used for negotiations, marketing strategies, pricing, etc.

It can be easily adapted to changes in the business processes and rules, new competitors, etc.

The rules of an - marketplace can be formalized as a game, so that agents can automatically learn how to participate. An adaptable game computer would allow the user to modify the rules for arbitrary variants of a game. Single-Player, Deterministic

Example Games

Single-Player, Deterministic Two-Player, Zero-Sum, Deterministic Two-Player, Zero-Sum, Deterministic Two-Player, Zero-Sum, Nondeterministic

n-Player, Deterministic n-Player, Incomplete Information, Nondeterministic General Game Playing Initiative Roadmap

games.stanford.edu The Game Description Language GDL: Knowledge Representation Game description language How to make legal moves: Variety of games/actual matches Automated Reasoning Basic player available for download How to solve simple games: Annual world cup @AAAI (since 2005) Planning & Search Price money: US$ 10,000 How to play well:

(deterministic games w/ complete information only) Learning

Every finite game can be modeled as a state transition system

Game Description Language

But direct encoding impossible in practice

19,683 states ~ 1043 legal positions Modular State Representation: Fluents Fluent Representation for Chess (2)

8 8 7 cell(X,Y,C) 7 canCastle(P,S) 6 X ∈ {a,...,h} 6 P ∈ {white,black} 5 Y ∈ {1,...,8} 5 S ∈ {kingsSide,queensSide} 4 C ∈ {whiteKing,...,blank} 4 enPassant(C) 3 3 control(P) 2 2 C ∈ {a,...,h} P ∈ {white,black} 1 1

a b c d e f g h a b c d e f g h

Actions Game Rules (I)

8 Players roles([white,black]) 7 move(U,V,X,Y) 6 U,X ∈ {a,...,h} 5 V,Y ∈ {1,...,8} 4 init(cell(a,1,whiteRook)) ¡ promote(X,Y,P) Initial position ... 3 X,Y ∈ {a,...,h} 2 P ∈ {whiteQueen,...} 1 legal(white,promote(X,Y,P)) <=

a b c d e f g h Legal Moves ¡ true(cell(X,7,whitePawn)) ... Game Rules (II) Clausal Logic Variables: X, Y, Z Position updates next(cell(X,Y,C)) <= Constants: a, b, c does(P,move(U,V,X,Y)) Functions: f, g, h ¡ true(cell(U,V,C)) Predicates: p, q, r, = ¢ Logical Operators: ¬, ¡ , , <= terminal <=

End of game ¢ Terms: X, Y, Z, a, b, c, f(a), g(a,X), h(a,b,f(Y)) checkmate stalemate Atoms: p(a,b) Literals: p(a,b), ¬q(X,f(a)) goal(white,100) <= Result true(control(black)) Clauses: Head <= Body ¡ checkmate Head: relational sentence ¢ goal(white, 50) <= stalemate Body: logical sentence built from ¡ , , literal

Game-Independent Vocabulary Axiomatizing Tic-Tac-Toe: Fluents

Relations cell(X,Y,M) roles(list-of(player)) 3 X,Y ∈ 1,2,3 init(fluent) { } true(fluent) 2 M ∈ {x,o,b} does(player,move) next(fluent) 1 control(P) legal(player,move) goal(player,value) P ∈ {xplayer,oplayer} terminal 1 2 3 Axiomatizing Tic-Tac-Toe: Actions Tic-Tac-Toe: Vocabulary Constants xplayer, oplayer Players x, o, b Marks 3 Functions mark(X,Y) cell(number,number,mark) Fluent control(player) Fluent 2 X,Y ∈ {1,2,3} mark(number,number) Action 1 noop Predicates row(number,mark) column(number,mark) 1 2 3 diagonal(mark) line(mark) open

Players and Initial Position Preconditions

roles([xplayer,oplayer]) legal(P,mark(X,Y)) <= init(cell(1,1,b)) true(cell(X,Y,b)) ¡ init(cell(1,2,b)) true(control(P)) init(cell(1,3,b)) legal(xplayer,noop) <= init(cell(2,1,b)) true(cell(X,Y,b)) ¡ init(cell(2,2,b)) true(control(oplayer)) init(cell(2,3,b)) legal(oplayer,noop) <= init(cell(3,1,b)) true(cell(X,Y,b)) ¡ init(cell(3,2,b)) true(control(xplayer)) init(cell(3,3,b)) init(control(xplayer)) Update Termination next(cell(M,N,x))<= does(xplayer,mark(M,N)) terminal <= line(x) ¢ line(o) next(cell(M,N,o))<= does(oplayer,mark(M,N)) terminal <= ¬open next(cell(M,N,W))<= true(cell(M,N,W)) ¡ ¬W=b line(W) <= row(M,W) next(cell(M,N,b))<= true(cell(M,N,b)) ¡ line(W) <= column(N,W) ¢ ¡ does(P,mark(J,K)) (¬M=J ¬N=K) line(W) <= diagonal(W) next(control(xplayer)) <= true(control(oplayer)) open <= true(cell(M,N,b)) next(control(oplayer)) <= true(control(xplayer))

Supporting Concepts Goals

row(M,W) <= true(cell(M,1,W)) ¡ true(cell(M,2,W)) ¡ true(cell(M,3,W)) goal(xplayer,100) <= line(x) ¡ ¡

¡ goal(xplayer,50) <= ¬line(x) ¬line(o) ¬open column(N,W) <= true(cell(1,N,W)) true(cell(2,N,W)) ¡ goal(xplayer,0) <= line(o) true(cell(3,N,W)) goal(oplayer,100) <= line(o) ¡ ¡ diagonal(W) <= true(cell(1,1,W)) ¡ goal(oplayer,50) <= ¬line(x) ¬line(o) ¬open ¡ true(cell(2,2,W)) goal(oplayer,0) <= line(x) true(cell(3,3,W))

diagonal(W) <= true(cell(1,3,W)) ¡ true(cell(2,2,W)) ¡ true(cell(3,1,W)) Finite Games Games as State Machines

Finite Environment Game “world” with finitely many states b e h One initial state and one or more terminal states Fixed finite number of players Each with finitely many “percepts” and “actions” Each with one or more goal states a c f i k Causal Model Environment changes only in response to moves Synchronous actions d g j

Initial State and Terminal States Simultaneous Actions

a/a a/b b e h b e h

a/a b/b a/b a/a a/a

b/a b/b a/b a/b a c f i k a/b a c f i k a/a

a/b b/a b/a b/b b/b

a/a a/a d g j d g j GDL for Trading Games: Example Game Model (English Auction)

An n-player game is a structure with components: role(bidder_1) § ... § role(bidder_n) next(winner(P)) <= does(P,bid(X)) § bestbid(X) S – set of states next(highestBid(X)) <= does(P,bid(X)) § bestbid(X) init(highestBid(0)) § next(winner(P)) <= true(winner(P)) not bid

init(round(0)) § A1, ..., An – n sets of actions, one for each player next(highestBid(X)) <= true(highestBid(X) not bid legal(P,bid(X)) <= next(round(N)) <= true(round(M)), successor(M,N) £ true(highestBid(Y)) § greaterthan(X,Y) l1, ..., ln – where li Ai × S, the legality relations bid <= does(P,bid(X)) legal(P,noop) § ¤ bestbid(X) <= does(P,bid(X)) not overbid(X)

u: S × A × ... × A S – update function § 1 n terminal <= true(round(10)) overbid(X) <= does(P,bid(Y)) greaterthan(Y,X) ¥ s1 S – initial game state t £ S – the terminal states ¦ £ g1, ... gn – where gi S × , the goal relations

Try it Yourself: Play this Game!

role(you) legal(you,jump(X,Y)) <= true(cell(X,onecoin)) § true(cell(Y,onecoin)) § init(step(1)) ( twobetween(X,Y) | twobetween(Y,X) ) init(cell(1,onecoin)) init(cell(Y,onecoin)) <= succ(X,Y) zerobetween(X,Y) <= succ(X,Y) § § succ(1,2) § succ(2,3) ... succ(7,8) zerobetween(X,Y) <= succ(X,Z) § true(cell(Z,zerocoins)) § zerobetween(Z,Y) §

next(step(Y)) <= true(step(X)) succ(X,Y) § Automated Reasoning onebetween(X,Y) <= succ(X,Z) true(cell(Z,zerocoins))

next(cell(X,zerocoins)) <= does(you,jump(X,Y)) § onebetween(Z,Y)

next(cell(Y,twocoins)) <= does(you(jump(X,Y)) § onebetween(X,Y) <= succ(X,Z) true(cell(Z,onecoin)) §

next(cell(X,C)) <= does(you,jump(Y,Z)) § zerobetween(Z,Y) §

true(cell(X,C)) § twobetween(X,Y) <= succ(X,Z) true(cell(Z,zerocoins)) §

distinct(X,Y) distinct(X,Z) § twobetween(Z,Y) twobetween(X,Y) <= succ(X,Z) § true(cell(Z,onecoin)) terminal <= ~continuable § continuable <= legal(you,M) onebetween(Z,Y) twobetween(X,Y) <= succ(X,Z) § true(cell(Z,twocoins)) goal(you,100) <= true(step(5)) § goal(you,0) <= true(cell(X,onecoin)) zerobetween(Z,Y) Background: Reasoning about Actions

Game descriptions are a good example of knowledge representation McCarthy's Situation Calculus (1963) with formal logic. s0 Automated reasoning about actions necessary to do(A1,s0) do(An,s0) determine legal moves update positions recognize end of game

... do(Aj,do(Ai,s0)) ...

Reasoning about Actions using Situations The Frame Problem

A frame axiom for Tic-Tac-Toe: ¨ ¨ Effect Axioms: ( S)( ...) cell(M,N,W,do(P,mark(J,K),S)) <= © © ¢ cell(M,N,W,S) ¡ (M J N K) ¨ ¨ ( S)( M,N) cell(M,N,x,do(xplayer,mark(M,N),S)) Compare this to the GDL axiom next(cell(M,N,W))<= true(cell(M,N,W)) ¡ ¬W=b

The Frame Problem (McCarthy & Hayes, 1969) arises because ¡ next(cell(M,N,b))<= true(cell(M,N,b)) ¢ mere effect axioms do not suffice to infer non-effects! does(P,mark(J,K)) ¡ (¬M=J ¬N=K)

How does cell(2,2,o,s) imply cell(2,2,o,do(xplayer,mark(3,3),s))? In a domain with m actions and n fluents, in the order of n·m frame axioms are needed. Successor State Axioms Successor State Axioms for Tic-Tac-Toe

“If AI can be said to have a classic problem, then the Frame ¨ ¨ Problem is it. Like all good open problems it is subtle, ( P,A,S)( ...) cell(M,N,W,do(P,A,S)) <=> ¢ ¡ ¡ challenging, and it has led to significant new technical and W=x P=xplayer A=mark(M,N) + ¢ ¡ ¡ conceptual developments in the field.” (Reiter, 1991) W=o P=oplayer A=mark(M,N) ¡  cell(M,N,W,S)  A=mark(M,N) - A successor state axiom (Reiter, 1991) for every fluent  ¨ ¨ avoids extra frame axioms: ( P,A,S)( R) control(R,do(P,A,S) <=> ¢ ¡

  +  - R=xplayer control(oplayer,S) +

( P,A,S) (do(P,A,S)) <=> [ (S) ] ¡ R=oplayer control(xplayer,S)  + :reasons for to become true  - : reasons for to become false

The Computational Frame Problem Fluent Calculus

F F F F F F F F F F

1 2 3 4 5 6 7 8 9 10  A state update axiom (T., 1999) for every action avoids S 0 separate update axioms for every fluent:  ¨   ¡ S  - + 1 ( S) 1(S) state(do(P, ,S)) = state(S) - 1 + 1 ¢ ... ¢    ¡ S  - + 2 k(S) state(do(P, ,S)) = state(S) - k + k  +: fluents that become true S  - 3 : fluents that become false (where subtraction z-  - and addition z+  + axiomatically defined) s d e y e l r i g o

g p n a e n o i g m h i l u l t n e o a T n r s a r a g i r i o C a n c u i n e e i G t f h

m i L s a m t

c s r & i r e l L e e e m

A b n t

g i T i a u g n h a r r h P o c n

e i l u c a g e r s n t o r e e e g t o M o m a o r

h s A a r d p u n t d

c a t y l p m s n P S a c e l a

U a a l a e e

R i L r r n C L e

h e g c s o T & i n

i

n o l e s t t r e n v e e e s g i a c o i a h P l L g t 8 l

h G r M A

n 0 e c n n t o i y 0 o A n i o 2 I S M M t e i t p i m r c a c A s G e D + - - + 1

1 2 2



   ) ) = =

x o ) ) , , ) ) N N S S , , , , ) ) ) ) r r M M e ( N N ( e e l l , , l l y y o e e M M a a l l ( ( c T c

p k k - p X r r + + o x c

( ( a a ) ) U l l r r a o o m m L e e r r , , t t T r r y y F - n n e e a a

l l o o c y y p i d p c c a a

l l x o n T ( ( – – p p l l

x a o ) ) o o r ( (

r r S S t t o o o ( ( s f n n ) d d e e

( ( u o o t t S l s e c c e ( a a t

t t t u e a a s s + + t t m t c a l s s t o

¡ ¡ i s a x

= C ) )

) A S S t )

, , r p r n e e o e t e y y o a a a n u l l , l d p p P F x o ( p (

( l o l U e o o d

r r ( t t h e e n n t t T o a o t a c c

s ¢ t ) ) . S .

. P

¨ ¨ ( (

) )

¨ ¨ S S

( (

Game Tree Search (General Concept)

Planning and Search

Breadth-First Search Depth-First Search

a a

b c d b c d

e f g h i j e f g h i j a b e f c g h d i j a b c d e f g h i j Advantage: Small intermediate storage Advantage: Finds shortest solution Disadvantage: Susceptible to garden paths Disadvantage: Consumes large amount of space Disadvantage: Susceptible to infinite loops Time and Space Comparison Iterative Deepening

Worst case for search depth d, solution at depth k

Time Binary Branching b ______Run depth-limited search repeatedly  Depth-First 2d – 2d-k b  d bd k b 1 starting with a small initial depth d

 k k b 1 incrementing on each iteration d := d + 1 Breadth-First 2 - 1 b 1 until success or run out of alternatives

Space Binary Branching b ______ Depth-First d (b - 1) (d - 1) + 1 Breadth-First 2k-1 bk-1

Example Time Comparison a Worst case for branching factor 2 Depth Iterative Deepening Depth-First b c d 1 1 1 2 4 3 e f g h i j 3 11 7 d = 1: a 4 26 15 d = 2: a b c d 5 57 31 d = 3: a b e f c g h d i j n 2n+1 – n – 2 2n – 1 Advantage: Small intermediate storage Advantage: Finds shortest solution Theorem: The cost of iterative deepening search is Advantage: Not susceptible to garden paths b/(b-1) times the cost of depth-first search (where b Advantage: Not susceptible to infinite loops is the branching factor). Game Rules Basic Subroutines for Search

legal(P,mark(X,Y)) <= true(cell(X,Y,b)) ¡ function legals (role, node) true(control(P)) findall(X, legal(role,X), node.position  gamerules) next(cell(M,N,x)) <= does(xplayer,mark(M,N)) function simulate (node,moves)  

¡ findall(true(P), next(P), node.position moves gamerules) next(cell(M,N,W)) <= true(cell(M,N,W)) ¬W=b function terminal (node) 

¢ terminal terminal <= line(x) line(o) prove( , node.position gamerules) goal(xplayer,100) <= line(x) function goal (role, node) findone(X, goal(role,X), node.position  gamerules)

A General Architecture Node Expansion (Single Player Games) Game Compiled Description Theory function expand(node) begin al := []; Reasoner for a in legals(role,node) do data := simulate(node,{does(role,a)}); new := create_node(data); al := {(a,new)}  al end-for; Move State Termination return al List Update & Goal end

Search Best Move (Single Player Games) State-Space Search with Multiple Players

a/a a/b s2 e h function bestmove(alist) begin a/a b/b max := 0; a/b a/a a/a best := head(node.actionlist); for a in node.actionlist do score := maxscore(a.new.alist); b/a b/b a/b a/b a/b a/a if score = 100 then return a; s1 s3 f i k if score > max then max := score; best := a end-if a/b b/a b/a b/b b/b end-for; return best a/a a/a end s4 g j function maxscore(alist) % returns best score among the alist actions

Single Player Game Graph Multiple Player Game Graph

s2 s2

aa

ba s1 s3 ab s1 s3

bb s4 s4 Bipartite Game Graph Move Lists

Simple move list s2 aa [(a,s2),(b,s3)]

a Multiple player move list ab [([a,a],s2),([a,b],s1), s1 s3 ([b,a],s3),([b,b],s4)] ba b Bipartite move list [(a,[([a,a],s2),([a,b],s1)]), bb s4 (b,[([b,a],s3),([b,b],s4)])]

Multiple Player Node Expansion Best Move bestmove (node) function expand (node) function begin begin al := []; jl := []; max := 0; (best,jl) := head(node.alist); for a in legals(role,node) do for (a,jl) in node.alist do for j in joints(role,a,node) do score := minscore(jl); data := simulate(node,jointactions(j)); if score = 100 then return a; new := create_node(data);  if score > max then jl := {(j,new)} jl max := score; best := a end-for;  end-if al := {(a,jl)} al end-for; end-for; return best return al end end function joints (role,action) % returns combinatorial list of all legal joint actions Note: This makes the paranoid assumption that the other % where role does action players make the most harmful (for us) joint move. function jointactions(j) % returns set of does atoms for joint action j  Minimax for Two-Person Zero-Sum Games The  - -Principle:  -Cutoffs

40 40 max max

40 40 10 min 40  40  35 min

75 40 50 80 40 60 35 20 10 75 40 50 80 40 60 35 20 10   The  - -Principle:  - and -Cutoffs State Collapse

= 0 = 1 00 max

The game tree for Tic-Tac-Toe has approximately 700,000   = 0 nodes. There are approximately 5,000 distinct states.

= 60 = 60 = 100 60 60 60 min = 100 = 100 Searching the tree requires 140 times more work than searching the graph. = 0 = 0  Recognizing a repeat state takes time that varies with the

60 60 50 40 max = 100 = 60 size of the graph thus far seen. Solution: Transposition tables

60 45 75 90 10 50 35 30 35 40 20 15 Symmetry Reflectional Symmetry

Symmetries can be logically derived from the rules of a game.

A symmetry relation over the elements of a domain is an equiva-lence relation such that two symmetric states are either both terminal or non-terminal if they are terminal, they have the same goal value if they are non-terminal, the legal moves in each of them are symmetric and yield symmetric states

Connect-3

Rotational Symmetry Factoring Example

Hodgepodge = Chess + Othello

Branching factor: a Branching factor: b

Branching factor as given to players: a · b Fringe of tree at depth n as given: (a · b)n n n Capture Go Fringe of tree at depth n factored: a + b Double Tic-Tac-Toe Game Factoring and its Use

A set ! of fluents and moves is a behavioral factor if and only if there are no connections between the fluents and moves in ! and those outside of .! 1. Compute factors Behavioral factoring Goal factoring 2. Play factors 3. Reassemble solution Branching factor: 81, 64, 49, 36, 25, 16, 9, 4, 1 Append plans Branching factor (factored): 9, 8, 7, 6, 5, 4, 3, 2, 1 (times 2) Interleave plans Parallelize plans with simultaneous actions

Competition vs. Cooperation Mathematical Game Theory: Strategies

Game model: The “paranoid” assumption says that opponents choose the joint S – set of states move that is most harmful for us. A1, ..., An – n sets of actions, one for each player £ l1, ..., ln – where li Ai × S, the legality relations ¦ This is usually too pessimistic for other than zero-sum games g , ..., g – where g £ S × , the goal relations and games with n > 2 players. A rational opponent chooses the 1 n i move that's best for him rather than the one that's worst for us. A strategy xi for player i maps every state to a legal move for i ¤ Moreover, from a game theoretic point of view, it is incorrect to ¥ x : S A ( such that (x (S),S) l ) model simultaneous moves as a sequence of our move followed i i i i by the joint moves of our opponents. Example: Rock-Paper-Scissors (Remark: The set of strategies is always finite in a finite game. However, there are more strategies in Chess than atoms in the universe ...) Games in Normal Form Equilibria

An n-player game in normal form is an n+1-tuple

 Let Г = (X1, ..., Xn,u) be an n-player game. = (X1, ..., Xn,u)

(x1*, ..., xn*) equilibrium where Xi is the set of strategies for player i and ¥ if for all i = 1, ..., n and all xi Xi ¦

n ¤ " i u = (u1, ..., un): Xi # i=1 ui(x1*, ..., xi-1*, xi, xi+1*, ..., xn*) ui(x1*, ..., xn*) are the utilities of the players for each n-tuple of strategies. An equilibrium is a tuple of optimal strategies: No player has a (Remark: Each n-tuple of strategies determines directly the reason to deviate from his or her strategy, given the opponent's outcome of a match, even if this consists of sequences of strategies. moves.)

Dominance Dominance: Example

Consider a game where both players have strategies {a, b, c, d, e}. ¥ ¥ A strategy x Xi dominates a strategy y Xi if Let the goal values be given by

u(x , ..., x , x, x , ..., x ) $ u(x , ..., x , y, x , ..., x ) i 1 i-1 i+1 n i 1 i-1 i+1 n Player 2

¥ × × × × × a b c d e for all (x1, ..., xi-1, xi+1, ..., xn) X1 ... Xi-1 Xi+1 ... Xn. 0 7 8 0 4 a

¥ ¥ 10 7 6 9 8 A strategy x X strongly dominates a strategy y Xi if 2 8 2 5 6 i b x dominates y and y does not dominate x. 10 4 6 9 5 Player 1 3 6 1 4 5 c 9 7 9 8 8 Assume that opponents are rational: 9 4 6 0 9 d They don't choose a strongly dominated strategy. 2 6 4 3 7 Dominance: Example (ctd) Dominance: Example (ctd)

Player 2 Player 2 a b c d e a b c d e 0 7 8 0 4 0 7 8 0 4 a a 10 7 6 9 8 10 7 6 9 8 2 8 2 5 6 b 10 4 6 9 5 Player 1 3 6 1 4 5 Player 1 3 6 1 4 5 c c 9 7 9 8 8 9 7 9 8 8 9 4 6 0 9 d 2 6 4 3 7

Dominance: Example (ctd) Game Tree Search with Dominance

(60,50) Player 1

Player 2 b c (40,40) (60,50) (20,60) Player 2 7 8 a 7 6

Player 1 6 1 c 7 9 (75,25) (40,40) (50,30) (80,40) (40,40) (60,50) (35,60) (20,60) (10,50)  The  - -Principle does not Apply Mixed Strategies

Let (X1, ..., Xn, u) be an n-player game, then its mixed extension is

Г = (P1, ..., Pn, (e1, ..., en)) where for each i=1, ..., n  (40,40)  40? 35? Pi = {pi: pi probability measure over Xi} ¥ and for each (p1, ..., pn) P1 × ... × Pn % % ei(p1, ..., pn) = ... ui(x1, ..., xn) · p1(x1) · ... · pn(xn) x & X x & X (75,25) (40,40) (50,30) (80,40) (40,40) (60,50) (35,60) (20,60) (10,50) 1 1 n n Nash's Theorem: Every mixed extension of an n-player game has at least one equilibrium.

Iterated Row Dominance for Mixed Strategies Iterated Row Dominance for Mixed Strategies (ctd)

Let a zero-sum game be given by a b c a b c a 10 0 8 a 10 0 8 b 6 4 4 b 6 4 4 c 3 8 7 c 3 8 7 ' ( ' ( Now p = 1 , 1 , 0 dominates p ' = (0,0,1). 1 ,0 , 1 2 2 2 2 Then p1 = 2 2 dominates p1' = (0,1,0). ¥ Hence, for all (pa', pb', pc') P1 with pb' > 0 there exists ¥ a dominating strategy (pa, 0, pc) P1. Iterated Row Dominance for Mixed Strategies (ctd)

a b c a 10 0 8 Learning c 3 8 7 ) * ' ( ' ( 1 ,0 , 2 , 1 , 1 ,0 . The unique equilibrium is 3 3 2 2

Roadmap Complete vs. Incomplete Search

Heuristics Simple games like Tic-Tac-Toe and Rock-Paper-Scissors can be searched completely. Detecting Structures Generating Evaluation Functions "Real" games like Peg Jumping, Chinese Checkers, Chess cannot. The Viking Method Incomplete Search Towards Good Play

Besides efficient inference and search algorithms, the ability to automatically generate a good evaluation function distinguishes good from bad General Game Playing programs.

Existing approaches: Mobility and Novelty Heuristics Structure Detection Fuzzy Goal Evaluation e e e e e e e e e estimated val's The Viking Method: Monte-Carlo Tree Search

Requires to automatically generate evaluation functions

Mobility

More moves means better state Advantage: In many games, being cornered or forced into making a move is Constructing an Evaluation Function quite bad - In Chess, having fewer moves means having fewer pieces, pieces of lower value, or less control of the board - In Chess, when you are in check, you can do relatively few things compared to not being in check - In Othello, having few moves means you have little control of the board Disadvantage: Mobility is bad for some games Worldcup 2006: Cluneplayer vs. Fluxplayer Inverse Mobility

Having fewer things to do is better This works in some games, like Nothello and Suicide Chess, where you might in fact want to lose pieces How to decide between mobility and inverse mobility heuristics?

Novelty Designing Evaluation Functions

Changing the game state is better Typically designed by programmers/humans Advantage: - Changing things as much as possible can help avoid getting stuck A great deal of thought and empirical testing goes into choosing - When it is unclear what to do, maybe the best thing is to throw in one or more good functions some directed randomness E.g. Disadvantage: - piece count, piece values in chess - Changing the game state can happen if you throw away your own - holding corners in Othello pieces ... - Unclear if novelty per se actually goes anywhere useful for But this requires knowledge of the game's structure, semantics, anybody play order, etc. Identifying Domains Identifying Structures: Relations

Domains of fluents identified by dependency graph A successor relation is a binary relation that is antisymmetric, + + functional, and injective. succ(0,1) succ(1,2) succ(2,3) init(step(0))

+ Example: next(step(X)) <= true(step(Y)) succ(Y,X) succ(1,2) + succ(2,3) + succ(3,4) + ... next(a,b) + next(b,c) + next(c,d) + ... step/1 0 An order relation is a binary relation that is antisymmetric and succ/1 1 transitive. 2 Example: succ/2 3 lessthan(A,B) <= succ(A,B) lessthan(A,C) <= succ(A,B) + lessthan(B,C)

Boards and Pieces Fuzzy Goal Evaluation: Example 1 2 3 An (m-dimensional) board is an n-ary fluent (n , m+1) with m arguments whose domains are successor relations 1 goal(xplayer,100) <= 1 output argument line(x) line(P) <= row(P) Example: 2 -

+ + col(P) cell(a,1,whiterook) cell(b,1,whiteknight) ... - diag(P) A marker is an element of the domain of a board's output argument. 3 A piece is a marker which is in at most one board cell at a time. Example: Pebbles in Othello, White King in Chess Value of intermediate state = Degree to which it satisfies the goal Full Goal Specification After Unfolding

goal(x,100) <= true(cell(1,Y,x)) / true(cell(2,Y,x)) / goal(xplayer,100) <= line(x) true(cell(3,Y,x)) . . . line(P) <= row(P) col(P) diag(P) / / true(cell(X,1,x)) true(cell(X,2,x))

/ / true(cell(X,3,x)) row(P) <= true(cell(1,Y,P)) true(cell(2,Y,P)) . true(cell(3,Y,P)) / / col(P) <= true(cell(X,1,P)) / true(cell(X,2,P)) / true(cell(1,1,x)) true(cell(2,2,x)) true(cell(X,3,P)) true(cell(3,3,x)) / / diag(P) <= true(cell(1,1,P)) true(cell(2,2,P)) . / true(cell(3,3,P)) /

/ / true(cell(3,1,x)) true(cell(2,2,x)) diag(P) <= true(cell(3,1,P)) true(cell(2,2,P)) true(cell(1,3,x)) true(cell(1,3,P)) 3 literals are true after does(x,mark(1,1)) 2 literals are true after does(x,mark(1,2)) 4 literals are true after does(x,mark(2,2))

Evaluating Goal Formula (Cont'd) Degree to which f(x,a) is true given that f(x,b) holds: (1-p) - (1-p) * 1 (b,a) / |dom(f(x))|

Our t-norms: Instances of the Yager family (with parameter q)

T(a,b) = 1 – S(1-a,1-b) (f,10) S(a,b) = (a^q + b^q) ^ (1/q)

(e,5) (j,5) Evaluation function for formulas

eval(f + g) = T'(eval(f),eval(g))

. With p= 0.9, eval(cell(green,e,5)) is eval(f g) = S'(eval(f),eval(g))

0 true(cell(green,f,10)) eval( f) = 1 - eval(f) 0.082 if 0.085 if true(cell(green,j,5)) Advanced Fuzzy Goal Evaluation: Example Identifying Metrics

Order relations Binary, antisymmetric, functional, injective

(j,13) succ(1,2). succ(2,3). succ(3,4). file(a,b). file(b,c). file(c,d). init(cell(green,j,13)) + ...

goal(green,100) Order relations define a metric on functional features <= true(cell(green,e,5) 1 (e,5) + ... (cell(green,j,13),cell(green,e,5)) = 13

Truth degree of goal literal = (Distance to current value)-1

Degree to which f(x,a) is true given that f(x,b): A General Architecture

(1-p) - (1-p) * 1 (b,a) / |dom(f(x))| Game Compiled Description Theory

Reasoner (f,10)

(e,5) (j,5) Move State Termination Evaluation List Update & Goal Function

With p = 0.9, eval(cell(green,e,5)) is 0.082 if true(cell(green,f,10)) Search 0.085 if true(cell(green,j,5)) Assessment An Alternative Approach: The Viking Method

Fuzzy goal evaluation works particularly well for games with aka used by Cadiaplayer (Reykjavik University) independent sub-goals 15-Puzzle converge to the goal horizon Chinese Checkers quantitative goal Othello 100 0 50 partial goals Game Tree Seach MC Tree Search Peg Jumping, Chinese Checkers with >2 players

Monte Carlo Tree Search Confidence Bounds

Play one random game for each move Value of move = Average score returned by simulation For next simulation choose move 2 6 5 3 4 log n argmax v C confidence bound n = 60 i i n v = 40 i n = 60 v = 70 n = 22 n = 18 n = 20 v = 20 v = 20 v = 80

n1 = 4 n2 = 24 n3 = 32 v = 20 v = 65 v = 80 ...... 1 2 3 ...... Assessment

Monte Carlo Tree Search works particularly well for games which converge to the goal The World Cup Checkers reward greedy behavior have a large branching factor do not admit a good heuristics

Game Master Game Master

Game description Game description Time to think: 1,800 sec Time to think: 1,800 sec Time per move: 45 sec Time per move: 45 sec Your role Your role

Player Player 1 Player2 ... Playern 1 Player2 ... Playern Game Master Game Master

Start Your move, please

Player Player Player Player 1 2 ... Playern 1 2 ... Playern

Game Master Game Master

Individual moves Joint move

Player Player 1 Player2 ... Playern 1 Player2 ... Playern 1st World Championship 2005 in Pittsburgh

Game Master 1. UCLA (Clune) 2. Florida End of game 3. Fluxplayer UT Austin

Player Player 1 2 ... Playern

2nd World Championship 2006 in Boston: 3rd World Championship 2007 in Vancouver: Final Leaderboard Final Leaderboard

Player Points Player Points

1. Fluxplayer 2690.75 1. Reykjavik 2724

2. UCLA 2573.75 2. Fluxplayer 2356

3. UT Austin 2370.50 3. Paris 2253

4. Florida 1948.25 4. UCLA 2122

5. TU Dresden II 1575.00 5. UT Austin 1798 7 7 The GGP Challenge

Much like RoboCup, General Game Playing combines a variety of AI areas fosters developmental research Summary has great public appeal has the potential to significantly advance AI

In contrast to RoboCup, GGP has the advantage to focus on high-level intelligence have low entry cost make a great hands-on course for AI students

A Vision for GGP Resources

Uncertainty Nondeterministic games with incomplete information Stanford GGP initiative games.stanford.edu - GDL specification Natural Language Understanding - Basic player Rules of a game given in natural language GGP in Germany general-game-playing.de - Game master Vision system sees board, pieces, cards, rule book, ... Palamedes palamedes-ide.sourceforge.net - GGP/GDL development tool Robot playing the actual, physical game Recommended Papers

J. Clune Heuristic evaluation functions for general game playing AAAI 2007 H. Finnsson, Y. Björnsson Simulation-based approach to general game playing AAAI 2008 M. Genesereth, N. Love, B. Pell General game playing AI magazine 26(2), 2006 G. Kuhlmann, K. Dresner, P. Stone Automatic heuristic construction in a complete general game player AAAI 2006 S. Schiffel, M. Thielscher Fluxplayer: a successful general game player AAAI 2007