Proceedings of the Twenty-Fourth International Joint Conference on (IJCAI 2015)

Execution Monitoring as Meta-Games for General Game-Playing Robots

David Rajaratnam and Michael Thielscher The University of New South Wales Sydney, NSW 2052, Australia daver,mit @cse.unsw.edu.au { }

Abstract the sophistication of execution monitors may vary [Petters- son, 2005] a common theme is that the execution monitor is General Game Playing aims to create AI systems independent of any action planning components. This allows that can understand the rules of new games and learn for a simplified model where it is unnecessary to incorporate to play them effectively without human intervention. complex monitoring and recovery behaviour into the planner. The recent proposal for general game-playing robots In this paper, we develop a framework for execution moni- extends this to AI systems that play games in the real toring for general game-playing robots that follows a similar world. Execution monitoring becomes a necessity model. From an existing game axiomatised in the general when moving from a virtual to a physical environ- Game Description Language GDL [Genesereth et al., 2005; ment, because in reality actions may not be executed Love et al., 2006] a meta-game is generated that adds an ex- properly and (human) opponents may make illegal ecution monitor in the form of an arbiter player. The “game” game moves. We develop a formal framework for being played by the arbiter is to monitor the progress of the execution monitoring by which an action theory that original game to ensure that the moves played by each player provides an axiomatic description of a game is au- are valid. If the arbiter detects an illegal or failed move then it tomatically embedded in a meta-game for a robotic has the task of restoring the game to a valid state. Importantly, player — called the arbiter — whose role is to mon- the non-arbiter players, whether human or robotic, can ignore itor and correct failed actions. This allows for the and reason without regard to the arbiter player while the latter seamless encoding of recovery behaviours within a becomes active only when an error state is reached. meta-game, enabling a robot to recover from these unexpected events. Our specific contributions are: (1) A fully axiomatic ap- proach to embedding an arbitrary GDL game into a meta-game that implements a basic execution monitoring strategy relative 1 Introduction to a given physical game environment. This meta-game is fully General game playing is the attempt to create a new genera- axiomatised in GDL so that any GGP player can take on the tion of AI systems that can understand the rules of new games role of the arbiter and thus be used for execution monitoring. and then learn to play these games without human interven- (2) Proofs that the resulting meta-game satisfies important tion [Genesereth et al., 2005]. Unlike specialised systems such properties including being a well-formed GDL game. (3) Gen- as the program Deep Blue, a general game player cannot eralisations of the basic recovery behaviour to consider actions rely on algorithms that have been designed in advance for spe- that are not reversible but instead may involve multiple actions cific games. Rather, it requires a form of general intelligence in order to recover the original game state and that need to be that enables the player to autonomously adapt to new and planned by the arbiter. possibly radically different problems. General game-playing The remainder of the paper is as follows. Section 2 briefly in- robots extend this capability to AI systems that play games in troduces the GDL language for axiomatising games. Section 3 the real world [Rajaratnam and Thielscher, 2013]. presents a method for embedding an arbitrary GDL game into Execution monitoring [Hahnel¨ et al., 1998; De Giacomo a GDL meta-game for execution monitoring. Section 4 inves- et al., 1998; Fichtner et al., 2003] becomes a necessity when tigates and proves important properties of the resulting game moving from a purely virtual to a physical environment, be- description, and Sections 5 and 6 describe and formalise two cause in reality actions may not be executed properly and extensions of the basic recovery strategy. (human) opponents may make moves that are not sanctioned by the game rules. In a typical scenario a robot follows a 2 Background: General Game Playing, GDL plan generated by a traditional planning algorithm. As it ex- ecutes each action specified by the plan the robot monitors The annual AAAI GGP Competition [Genesereth et al., 2005] the environment to ensure that the action has been success- defines a general game player as a system that can understand fully executed. If an action is not successfully executed then the rules of an n-player game given in the general Game De- some recovery or re-planning behaviour is triggered. While scription Language (GDL) and is able to play those games

3178 1 (role white) (role black) 2 3 (init (cell a 1 white)) ... (init (cell h 8 black)) ( (control white)) ⌃ 4 init G ⌃meta 5 6 (<= (legal white (move ?u ?x ?u ?y)) Figure 2: Embedding a game ⌃G into a meta-game ⌃meta. 7 (true (control white)) (++ ?x ?y) 8 (true (cell ?u ?x white)) (cellempty ?u ?y)) 9 (<= (next (cell ?x ?y ?p)) (does ?p (move ?u ?v ?x ?y))) 10 Finally, while the syntactic correctness of a game descrip- 11 (<= terminal (true (cell ?x 8 white))) tion specifies a game’s state transition system, there are addi- 12 (<= (goal white 100) (true (cell ?x 8 white))) 13 (<= (goal black 0) (true (cell ?x 8 white))) tional requirements for a game to be considered well-formed 14 [Love et al., 2006] and therefore usable for the GGP competi- 15 (<= (cellempty ?x ?y) (not (true (cell ?x ?y white))) tions. In particular a well-formed game must terminate after 16 (not (true (cell ?x ?y black)))) 17 (++ 1 2) ... (++ 7 8) a finite number of steps, every role must have at least one legal action in every non-terminal state (playability), every Figure 1: An excerpt from a game description with the GDL role must have exactly one goal value in every state and these keywords highlighted. values must increase monotonically, for every role there is a sequence of joint moves that leads to a terminal state where the role’s goal value is maximal (weakly-winnable). effectively. Operationally a game consists of a central con- troller that progresses the game state and communicates with players that receive and respond to game messages. 3 Meta-Games for Execution Monitoring The declarative language GDL supports the description of a 3.1 Systems Architecture finite n-player game (n 1). As an example, Fig. 1 shows The following diagram illustrates the intended overall systems some of the rules for BREAKTHROUGH , a variant of chess where the players start with two rows of pawns on their side architecture for execution monitoring of general games played of the board and take turns trying to “break through” their with a robot in a real environment. opponent’s ranks to reach the other side first. The two roles START/PLAY GGP Meta GDL Arbiter player are named in line 1. The initial state is given in lines 3–4. move Operator GDL/ START/PLAY Lines 6–8 partially specify the legal moves: White, when it control Operator GDL GGP Control Robot player is his turn, can move a pawn forward to the next rank, pro- move feedback vided the cell in front is empty. Line 9 is an example of a robot move rule: A pawn when being moved ends up Game Move position update human Game Controller Robot Controller in the new cell. A termination condition is given in line 11, move and lines 12–13 define the goal values for each player under It comprises the following components. this condition. Additional rules (lines 15–17) define auxiliary Accepts an arbitrary GDL description predicates, including an axiomatisation of simple arithmetics. Game Controller. of a game (from a human operator) and (a) automatically GDL is based on standard logic programming principles, al- generates a meta-GDL game for the arbiter to monitor its beit with a different syntax. Consequently, we adopt logic pro- execution; (b) interacts with one or more general game-playing gramming conventions in our formalisation, while maintaining systems that take different roles in the (meta-)game; and (c) GDL syntax for code extracts. Apart from GDL keyword re- interacts with the robot controller. strictions [Love et al., 2006], a GDL description must satisfy Robot Controller. Serves as the low-level executor of the certain logic programming properties, specifically it must be robot by (a) processing sensor data in order to recognise ac- stratified [Apt et al., 1987] and safe (or allowed) [Lloyd and tions; (b) commanding the object manipulator to execute any Topor, 1986]. As a result a GDL description corresponds to instance of a move action defined for the environment. a state transition system (cf. [Schiffel and Thielscher, 2010]). GGP Robot Player. Plays the desired game in the desired We adopt the following conventions for describing some game role for the robot. state transition system properties: GGP Arbiter Player. Monitors the execution of the game true def by playing the automatically constructed meta-game. S = true(f1),...,true(fn) consists of the flu- • ents that are{ true in state S . } In the following we describe in detail an axiomatic method of embedding a given game and a specific execution monitor- does def A = does(r1,a1),...,does(rn,an) , consists ing strategy into a meta-game. Our aim is to fully axiomatise • of the role-action{ statements making up a joint} move A . this meta-game in GDL, so that any GGP player can play the role of the arbiter and thus be used for execution monitoring. l(S)= (r, a) G Strue = legal(r, a) , where each • { | [ | } a is an action that is legal for a role r in a state S for 3.2 Automatic Construction of Meta-Games the game G . The general approach of embedding a given “source” game g(S)= (r, n) G Strue = goal(r, n) , where each into a meta-game is graphically illustrated in Figure 2. Games • n is a value{ indicating| [ the goal| score for} a role r in a can be viewed as state transition systems. Execution errors state S for the game G . correspond to changes in the environment that are physically

3179 1 (env_coord a 1) (env_coord b 1) ... (env_coord x 4) 4 2 (env_coord a-b 1) (env_coord b-c 1) ... 3 3 (<= (env_base (env_can ?x ?y)) 4 (env_coord ?x ?y)) 2 5 (<= (env_base (env_fallen ?x ?y)) 1 6 (env_coord ?x ?y)) a b c d x 7 (<= (env_input ?r noop) 8 (role ?r)) 9 (<= (env_input ?r (move ?u ?v ?x ?y)) 10 (role ?r) (env_coord ?u ?v) (env_coord ?x ?y)) 11 (<= (env_input ?r (move_and_topple ?u ?v ?x ?y)) 12 (role ?r) (env_coord ?u ?v) (env_coord ?x ?y)) 13 14 (init (env_can a 1)) ... (init (env_can d 4)) 15 (env_legal noop) 16 (<= (env_legal (move ?u ?v ?x ?y)) (a) (b) 17 (true (env_can ?u ?v)) 18 (not (true (env_can ?x ?y))) (env_coord ?x ?y)) 19 (<= (env_legal (move_and_topple ?u ?v ?x ?y)) Figure 3: Game environment with robot, and breakthrough 20 (true (env_can ?u ?v)) chess projected onto this environment. 21 (not (true (env_can ?x ?y))) (env_coord ?x ?y)) 22 23 (<= (env_next (env_can ?x ?y)) possible but do not correspond to legal moves in a game. To 24 (or (does ?p (move ?u ?v ?x ?y)) 25 (does ?p (move_and_topple ?u ?v ?x ?y))) account for—and recover from—such errors, we embed the 26 ((true (env_can ?u ?v)) state machine for the original game into one that describes 27 (not (env_can_moved ?u ?v)))) 28 (<= (env_next (env_fallen ?x ?y)) all possible ways in which the game environment can evolve. 29 (or (does ?p (move_and_topple ?u ?v ?x ?y)) Some of these transitions lead outside of the original game, 30 ((true (env_fallen ?u ?v)) e.g. when a human opponent makes an illegal move or the 31 (not (env_can_moved ?u ?v))))) 32 (<= (env_can_moved ?u ?v) execution of a robot move fails. The role of the arbiter player, 33 (or (does ?p (move ?u ?v ?x ?y)) who monitors the execution, is to detect these abnormalities 34 (does ?p (move_and_topple ?u ?v ?x ?y)))) and when they occur to perform actions that bring the system 35 36 (env_reverse noop noop) back into a normal state. Hence, a meta-game combines the 37 (<= (env_reverse (move ?u ?v ?x ?y) (move ?x ?y ?u ?v)) original game with a model of the environment and a specific 38 (env_coord ?u ?v) (env_coord ?x ?y)) 39 (<= (env_reverse (move ?u ?v ?x ?y) execution monitoring strategy. Its GDL axiomatisation can 40 (move_and_topple ?x ?y ?u ?v)) therefore be constructed as follows: 41 (env_coord ?u ?v) (env_coord ?x ?y))

⌃meta = ⌧(⌃G) ⌃env ⌃em , where Figure 4: An example GDL axiomatisation ⌃env of the game [ [ environment of Figure 3, including an axiomatisations of fail- ⌃ is the original GDL game and ⌧(⌃ ) a rewritten • G G ure actions (move and topple) and how to reverse the effects GDL allowing for it to be embedded into the meta-game; of actions (axioms 36–41). ⌃env is a GDL axiomatisation describing all possible • actions and changes in the physical game environment; and Environment models ⌃env

⌃em is a GDL axiomatisation implementing a specific The purpose of the game environment axiomatisation is to • execution monitoring strategy. capture all actions and changes that may happen, intentionally or unintentionally, in the physical environment and that we Redefining the source game ⌧ (⌃G) expect to be observed and corrected through execution moni- In order to embed a given game G into a meta-game for toring. Consider, for example, the game environment shown execution monitoring, simple rewriting of some of the GDL in Fig. 3(a). It features a 4 4 chess-like board with an addi- keywords occurring in the rules describing G are necessary. tional row of 4 marked positions⇥ on the right. Tin cans are the Specifically, the meta-game extends the fluents and actions of only type of objects and can be moved between the marked the original game and redefines preconditions and effects of positions. They can also (accidentally or illegally) be moved actions, as detailed below. Hence: to undesired locations such as the border between adjacent

Definition 1 Let ⌃G be a GDL axiomatisation of an arbi- cells or to the left of the board etc. Moreover, some cans may trary game. The set of rules ⌧(⌃G) are obtained from ⌃G also have (accidentally) toppled over. A basic action theory for by replacing every occurrence of this environment that allows to detect and correct both illegal as well as failed moves comprises the following. base(f) by source base(f); • Actions: noop; moving a can, move(u, v, x, y); and top- input(r, a) by source input(r, a); pling a can while moving it, move and topple(u, v, x, y); • legal(r, a) by source legal(r, a); and where u, x a, b, c, d, x a-, a-b,...,x- and v, y • 1, 2, 3, 4 2{1-, 1-2,...,}[4{- . The additional} coordinates2 next(f) by source next(f). { }[{ } • a-, a-b,...,1-,... are qualitative representations of “ille- All other keywords, notably true and does, remain un- gal” locations to the left of the board, between files a and b, changed. etc.

3180 Fluents: Any location (x, y) can either be empty or house illegal) move m. Any state in which a fluent of this form is a can, possibly one that has fallen over. Figure 3(b) illustrates true is an abnormal state, otherwise it is a normal state. one possible state, in which all cans happen to be positioned Legal moves. In a normal state—when no error needs to be upright and at legal locations. corrected—the meta-game goes beyond the original game in Action noop is always possible and has no effect. Actions considering any move that is possible in the environment for move(u, v, x, y) and move and topple(u, v, x, y) are pos- players. The execution monitoring strategy requires the arbiter sible in any state with a can at (u, v) but not at (x, y). The not to interfere (i.e., to noop) while in this state. effect is to transition into a state with a can at (x, y), fallen over in case of the second action, and none at (u, v). (<= (legal ?p ?m) A corresponding set of GDL axioms is given in Figure 4. It (not meta_correcting_mode) uses env base, env input, env legal and env next to (role ?p) (distinct ?p arbiter) describe the fluents, actions, preconditions, and effects. (env_legal ?m)) Note that this is just one possible model of the physical (<= (legal arbiter noop) (2) environment for the purpose of identifying—and correcting— (not meta_correcting_mode)) illegal and failed moves by both (human) opponents and a (<= meta_correcting_mode robotic player. An even more comprehensive model may ac- (true (meta_correcting ?m))) count for other possible execution errors such as cans com- pletely disappearing, e.g. because they fell off the table.1 In recovery mode, the arbiter executes recovery moves while ⌃ We tacitly assume that any environment model env in- normal players can only noop. cludes the legal actions of the source game description ⌃G as a subset of all possible actions in the environment.2 (<= (legal arbiter ?n) Execution monitoring strategies ⌃em (true (meta_correcting ?m)) The third and final component of the meta-game serves to (env_reverse ?n ?m)) (<= (legal ?p noop) (3) axiomatise a specific execution monitoring strategy. It is based meta_correcting_mode on the rules of the source game and the axiomatisation of (role ?p) (distinct ?p arbiter)) the environment. The meta-game extends the legal moves of the source game by allowing anything to happen in the meta- The meta-game enters a correcting mode game that is physically possible in the environment (cf. Fig- State transitions. whenever a bad move is made. If multiple players simultane- ure 2). As long as gameplay stays within the boundaries of ously make bad moves, the arbiter needs to correct them in the source game, the arbiter does not interfere. When an ab- successive states, until no bad moves remain. normality occurs, the arbiter takes control and attempts to correct the problem. For ease of exposition, in the follow- (<= (next (meta_correcting ?m)) ing we describe a basic recovery strategy that assumes each (meta_player_bad_move ?p ?m)) erroneous move n to be correctible by a single reversing (<= (next (meta_correcting ?m)) move m as provided in the environment model through the (true (meta_correcting ?m)) predicate env reverse(m, n). However, a more general exe- (not (meta_currently_correcting ?m))) cution monitoring strategy will be described in Section 5. Fluents, roles, and actions. The meta-game includes the (<= (meta_currently_correcting ?m) (4) additional arbiter role, the actions are that of the environment (does arbiter ?n) model,3 and the following meta-game state predicates. (env_reverse ?n ?m)) (<= (meta_player_bad_move ?p ?m) (role arbiter) (not meta_correcting_mode) (<= ( ?r ?m) (env_input ?r ?m)) input (does ?p ?m) (distinct ?p arbiter) (<= (base ?f) (source_base ?f)) (not (source_legal ?p ?m))) (<= (base ?f) (env_base ?f)) (1) (<= (base (meta_correcting ?m)) (env_input ?r ?m)) The environment fluents are always updated properly. Mean- while, in normal operation the fluents of the embedded game meta correcting(m) The new fluent will be used to indi- are updated according to the rules of the source game. cate an abnormal state caused by the (physically possible but

1We also note that, for the sake of simplicity, the example axioma- (<= (next ?f) (env_next ?f)) tisation assumes at most one can at every (qualitatively represented) (<= (next ?f) location, including e.g. the region below and to the left of the board. (not meta_bad_move) (source_next ?f) A more comprehensive axiomatisation would either use a more fine- (not meta_correcting_mode)) (5) grained coordinate system or allow for several objects to be placed in one region. (<= meta_bad_move 2This corresponds to the requirement that games be playable in a (meta_player_bad_move ?p ?m)) physical environment [Rajaratnam and Thielscher, 2013]. 3Recall the assumption that these include all legal actions in the When a bad move has just been made, the fluents of the em- source game. bedded game are fully preserved from one state to the next

3181 does does throughout the entire recovery process. Inductive step: select mi0 such that mi0 = mi does(arbiter, noop) . Now, ⌃ is required to be phys-[ { } G (<= (next ?f) (source_base ?f) (true ?f) ically playable in the environment model ⌃env so from Ax- meta_bad_move) ioms (2) and s0 being a normal state it follows that l(s ) (6) i i ✓ (<= (next ?f) (source_base ?f) (true ?f) l(si0 ) and legal(arbiter, noop) l(si0 ) . Hence mi0 meta_correcting_mode) consists of{ legal moves and its execution}✓ leads to a game does state si0+1 . Furthermore since the moves in mi are This completes the formalisation of a basic execution monitor- source legal then si0+1 will be a normal state (Axioms 4) ing strategy by meta-game rules. It is game-independent and true true and furthermore si+1 si0+1 . Finally, ⌧(⌃G) does hence can be combined with any game described in GDL. not modify any goal values✓ or the terminal predicate so si0+1 will be a terminal state iff si+1 is a terminal state and 4 Properties any goal values will be the same. 2 The previous section detailed the construction of a meta-game Proposition 2 captures the soundness of the meta-game with ⌃meta from an existing game ⌃G and a corresponding phys- respect to the original game. A similar completeness result ical environment model ⌃env . In this section we show that can also be established where any (terminating) run of the the resulting meta-game does indeed encapsulate the original meta-game corresponds to a run of the original, when the game and provides for an execution monitor able to recover corrections of the execution monitor are discounted. the game from erroneous actions. Proposition 3 For any run s0,m0,s1,m1,...,sn of the h i For any syntactically correct GDL game ⌃ game ⌃meta there exists a run s0 ,m0 ,s0 ,m0 ,...,s0 , Proposition 1 G h 0 0 1 1 oi and corresponding environment model ⌃ , the meta-game where o n , of the game ⌃G such that: env  ⌃meta is also a syntactically correct GDL game. for each pair (s ,m ) , 0 i n : • i i   ⌃ ⌃ Proof: G and env are required to satisfy GDL keyword – if si is a normal state then there is a pair (sj0 ,mj0 ) , restrictions, be safe, and internally stratified. ⌧(⌃G) does not true true does 0 j i , s.t. sj0 si and mi = change these properties and it can be verified that ⌃em also does  ✓ m0 does(arbiter, noop) , satisfies them. For example, legal would depend on does j [{ } only if source legal depended on does , which is only – else, si is an abnormal state, then for each non- does possible if ⌃G violated these restrictions. arbiter role r , does(r, noop) mi and does 2 Now, verifying that ⌃ = ⌧(⌃ ) ⌃ ⌃ is strati- does(r, a) mi for any a = noop . meta G env em 62 6 fied. The properties of ⌃ (resp. ⌃ [) ensures[ that ⌃ env em meta g(s0 ) g(sn) . will be unstratified only if ⌧(⌃ ) ⌃ (resp. ⌧(⌃ ) ⌃ ) • o ✓ G [ em G [ env is unstratified. Hence ⌃meta could be unstratified only if Proof: Similar to Proposition 2 but need to consider the case ⌧(⌃G) was unstratified. 2 where a joint action in ⌃meta is not legal in ⌃G . Note, the axioms for ⌃ and ⌃ do not provide for termination so The syntactic validity of the meta-game guarantees its cor- em env the terminal state sn must be a normal state. respondence to a state transition system. Now, we consider true true Base case: note s0 s and s is a normal state. terminating paths through a game’s state transition systems. 0 ✓ 0 0 Inductive step: assume si is a normal state and there is an true true Definition 2 For a game G with transition function , let s0 , j i , s.t. s0 s . There are two cases: j  j ✓ i s0,m0,s1,m1,...,sn be a sequence of alternating states 1) If all non-arbiter actions in m are source legal then h i i and joint moves, starting at the initial state s0 and assigning construct mj0 from mi (minus the arbiter action). It then fol- si+1 = (mi,si) , 0

3182 when the original game is well-formed the meta-game is not. environment state:4 Firstly, there is no termination guarantee for the meta-game. (<= (next (meta_prev ?f)) The simplest example of this is the case of a player that simply (true (meta_prev ?f)) repeats the same incorrect move while the arbiter patiently and meta_correcting_mode) endlessly corrects this mistake. Secondly, the meta-game is (<= (next (meta_correcting ?m)) not weakly-winnable or monotonic since no goal values have (true (meta_correcting ?m)) been specified for the arbiter role. (true (meta_prev ?f)) (not (env_next ?f))) This lack of well-formedness is not in itself problematic, as (<= (next (meta_correcting ?m) the intention is for the meta-game to be used in the specialised (true (meta_correcting ?m)) context of a robot execution monitor rather than to be played (env_next ?f) in the GGP competition. Nevertheless, satisfy these properties, (not (true (meta_prev ?f)))) and in particular the guarantee of termination, can have practi- Consider an environment model ⌃ for CONNECTFOUR cal uses. Consequently, extensions to construct a well-formed env that includes the actions of clearing an entire column and of meta-game are described in Section 6. dropping a single disk in a column. The arbiter can always perform a sequence of actions that brings the physical game back into a valid state after a wrong disk has been slotted 5 Planning into the grid. This flexibility requires the execution monitor to plan and reason about the consequences of actions in order The current requirements of the environment model ⌃ en- env to return to the appropriate normal state. As part of the meta- sure that the arbiter only has a single action that it can take game, this behaviour can be achieved by defining a goal value to correct an invalid move: it simply reverses the move. This for the arbiter that is inversely proportional to the number of allows even the simplest GGP player, for example one that occurrences of bad states in a game. A skilled general game- chooses any legal move, to play the arbiter role effectively. playing system taking the role of the arbiter would then plan However, this simplification is only possible under the assump- for the shortest sequence of moves to a normal state. tion that all unexpected changes are indeed reversible and that Worthy of note is that the generalised axiom (7) also ac- ⌃ explicitly provides the move. This cannot always be sat- env counts for the case of failed actions of the execution monitor isfied when modelling a physical environment. itself. Whenever this happens, the game remains in a bad state Instead, an existing environment model, such as the board and the arbiter would plan for correcting these too. environment of Figure 3, can be extended to a richer model that might require multiple moves to recover a normal game state. 6 Termination and Well-Formed Meta-Games For example, correcting a toppled piece could conceivably require two separate actions; firstly, to place the piece upright As highlighted in Section 4, currently the constructed meta- and subsequently to move it to the desired position. games are not well-formed. This can be undersirable for a Secondly, some environments simply cannot be modelled number of reasons. For example, a lack of guaranteed game with only reversible moves. For example, reversing an invalid termination can be problematic for simulation based players that require terminating simulation runs (e.g., Monte Carlo move in CONNECTFOUR, a game where coloured discs are [ ] slotted into a vertical grid, would involve a complex sequence tree search Bjornsson¨ and Finnsson, 2009 ). of moves, starting with clearing a column, or even the entire A simple mechanism to ensure game termination is to ex- ⌃ grid, and then recreating the desired disc configuration. tend em with a limit on the number of invalid moves that can be made by the non-arbiter roles. Firstly, a strikeout counter is Our approach can be easily extended to allow for these more maintained for each non-arbiter player. complex environment models. Most importantly, in an abnor- mal state the arbiter needs to be allowed to execute all physi- (<= (init (strikeout_count ?r 0)) cally possible moves. Formally, we remove the env reverse (role ?r) (distinct ?r arbiter)) predicate and replace the first axiom of (3) by (<= (next (strikeout_count ?r ?n)) (not (meta_bad_move ?r)) (true (strikeout_count ?r ?n))) (<= (legal arbiter ?m) (<= (next (strikeout_count ?r ?n)) (env_legal ?m) meta_correcting_mode) (7) (true (strikeout_count ?r ?m)) (meta_bad_move ?r) (++ ?m ?n)) We also substitute the second and third axioms of (4) by (++ 0 1) ... (++ 2 3) (<= (next (meta_prev ?f)) Next, a player strikes out if it exceeds some number (here (true ?f) (env_base ?f) three) of invalid moves, triggering early game termination. (meta_player_bad_move ?p ?m)) (<= (strikeout ?r) (true (strikeout_count ?r 3))) This replaces meta currently correcting(m) by the new fluent meta prev(f), whose purpose is to preserve the 4Since bad moves are no longer taken back individually, if they environment state just prior to a bad move. The goal is to re- happen simultaneously then for the sake of simplicity all instances of cover this state. Hence, the game remains in correcting mode meta correcting(m) remain true until the preceding valid state as long as there is a discrepancy between this and the current has been recovered.

3183 (<= strikeout (strikeout ?r)) GDL-II [Thielscher, 2010], which allows for modelling uncer- (<= terminal strikeout) tain and partially observable domains for an arbiter and hence to account for, and recover from, execution errors that a moni- Beyond this termination guarantee, we can also ensure that tor does not observe until well after they have occurred. An games are weakly winnable and goal scores are monotonic, by interesting avenue for future work is to consider the use of an defining goal values for the arbiter in all reachable states. Answer Set Programming (ASP) solver (see for example [Geb- (<= (goal arbiter 100) terminal) ser et al., 2012]) to help in this failure detection and recovery. (<= (goal arbiter 0) (not terminal)) Based on an observation of a failure state the ASP solver could generate candidate move sequences as explanations of how However, early game termination can be problematic if we the game entered this state from the last observed valid game treat the meta-game as the final arbiter of who has won and state. Finally, the general concept of automatically embedding lost the embedded game. A player with a higher score could a game into a meta-game has applications beyond execution intentionally strikeout in order to prevent some opponent from monitoring, for example, as an automatic game controller that improving their own score. To prevent this we, firstly, extend runs competitions and implements rules specific to them such Definition 1 to replace every occurrence of goal(r, v) by as the three-strikes-out rule commonly employed at the annual source goal(r, v) . Next, axioms are introduced to re-map AAAI contest for general game-playing programs [Genesereth goal scores for the original players, ensuring that a strikeout and Bjornsson,¨ 2013]. player receives the lowest score. (<= (goal ?r 0) (role ?r) (not terminal)) Acknowledgements (<= (goal ?r ?n) This research was supported under Australian Research Coun- (source_goal ?r ?n) terminal (not strikeout)) cil’s (ARC) Discovery Projects funding scheme (project num- ber DP 120102144). The second author is also affiliated with (<= (goal ?r 0) (strikeout ?r)) the University of Western Sydney. (<= (goal ?r 100) (role ?r) (distinct arbiter ?r) (not (strikeout ?r)) strikeout) References [Apt et al., 1987] K. R. Apt, H. A. Blair, and A. Walker. To- Proposition 4 For any syntactically correct and well-formed wards a theory of declarative knowledge. In J. Minker, game ⌃G and corresponding environment model ⌃env , the editor, Foundations of Deductive Databases and Logic Pro- extended meta-game ⌃meta is also syntactically correct and gramming, pages 89–148. Morgan Kaufmann, 1987. well-formed. [Bjornsson¨ and Finnsson, 2009] Y. Bjornsson¨ and H. Finns- Proof: It can be verified that the extra rules don’t change the son. CADIAPLAYER: A simulation-based general game syntactic properties of ⌃meta . Similarly, it is straightforward player. IEEE Transactions on Computational Intelligence to observe that ⌃em now guarantees termination, is weakly- and AI in Games, 1(1):4–15, 2009. winnable, and has monotonic goal scores. 2 [De Giacomo et al., 1998] G. De Giacomo, R. Reiter, and M. Note that the modification to allow for early termination Soutchanski. Execution monitoring of high-level robot does have consequences for Proposition 3. Namely, it will only programs. In Proceedings of KR, pages 453–465. Morgan hold if the run of a meta-game terminates in a normal state. Kaufmann, 1998. Trivially, if a meta-game terminates early then the embedded game will never complete and there will be no terminal state [Fichtner et al., 2003] M. Fichtner, A. Großmann, and M. in the corresponding run of the original game. Thielscher. Intelligent execution monitoring in dynamic environments. Fundam. Inform., 57(2-4):371–392, 2003. 7 Conclusion and Future Work [Gebser et al., 2012] Martin Gebser, Roland Kaminski, Ben- We have presented an approach for embedding axiomatisations jamin Kaufmann, and Torsten Schaub. Answer Set Solving of games for general game-playing robots into meta-games in Practice. Synthesis Lectures on AI and Machine Learn- for execution monitoring. This allows for the seamless encod- ing. Morgan & Claypool Publishers, 2012. ing of recovery behaviours within a meta-game, enabling a [Genesereth and Bjornsson,¨ 2013] M. Genesereth and Y. robot to recover from unexpected events such as failed action Bjornsson.¨ The international general game playing compe- executions or (human) players making unsanctioned game tition. AI Magazine, 34(2):107–111, 2013. moves. The approach is general enough to encompass a full [Genesereth et al., 2005] M. Genesereth, N. Love, and B. range of behaviours. From simple, prescribed strategies for Pell. General game playing: Overview of the AAAI com- correcting illegal moves through to complex behaviours that petition. AI Magazine, 26(2):62–72, 2005. require an arbiter player to find complex plans to recover from unexpected events. Alternatively, even for specifying early- [Hahnel¨ et al., 1998] D. Hahnel,¨ W. Burgard, and G. Lake- termination conditions, for example, when a game piece falls meyer. GOLEX — bridging the gap between logic off the table or a frustrated human opponent throws it away. (GOLOG) and a real robot. In Proceedings of KI, volume Our method can moreover be applied to games axiomatised in 1504 of LNCS, pages 165–176. Springer, 1998.

3184 [Lloyd and Topor, 1986] J.W. Lloyd and R.W. Topor. A basis for deductive database systems II. The Journal of Logic Programming, 3(1):55 – 67, 1986. [Love et al., 2006] N. Love, T. Hinrichs, D. Haley, E. Schkufza, and M. Genesereth. General Game Playing: Game Description Language Specification. Stanford Uni- versity, 2006. [Pettersson, 2005] O. Pettersson. Execution monitoring in : A survey. Robotics and Autonomous Systems, 53(2):73–88, 2005. [Rajaratnam and Thielscher, 2013] D. Rajaratnam and M. Thielscher. Towards general game-playing robots: Models, architecture and game controller. In Proceedings of AI, volume 8272 of LNCS, pages 271–276. Springer, 2013. [Schiffel and Thielscher, 2010] S. Schiffel and M. Thielscher. A multiagent semantics for the Game Description Lan- guage. In Agents and AI, volume 67 of CCIS, pages 44–55. Springer, 2010. [Thielscher, 2009] M. Thielscher. Answer set programming for single-player games in general game playing. In Pro- ceedings of ICLP, volume 5649 of LNCS, pages 327–341. Springer, 2009. [Thielscher, 2010] M. Thielscher. A general game description language for incomplete information games. In Proceed- ings of AAAI, pages 994–999, 2010.

3185