Heuristic Evaluation Functions for General Game Playing
Total Page:16
File Type:pdf, Size:1020Kb
Heuristic Evaluation Functions for General Game Playing James Clune Department of Computer Science The University of California, Los Angeles [email protected] Abstract a criticism of this work has been that the success of high- performance game-playing programs has come at the ex- A general game playing program plays games that it pense of generality. has not previously encountered. A game manager pro- gram sends the game playing programs a description of The idea of general game playing as a test-bed for AI is a game’s rules and objectives in a well-defined game presented in Barney Pell’s thesis (Pell 1993). Pell submits description language. A central challenge in creating general game playing as a way to preserve the elements that effective general game playing programs is that of con- have made game-playing research attractive to the AI com- structing heuristic evaluation functions from game de- munity, while maintaining generality. He defines a class scriptions. This paper describes a method for construct- of symmetric chess-like games and describes a system that ing evaluation functions that represent exact values of plays games within this class. The system creates heuris- simplified games. The simplified games are abstract tics through the application of metagame-level analysis of models that incorporate the most essential aspects of the the class of games, applying concepts such as mobility, cen- original game, namely payoff, control, and termination. trality, and promotion, automatically specializing these con- Results of applying this method to a sampling of games suggest that heuristic evaluation functions based on our cepts for the particular games. method are both comprehensible and effective. The version that is the focus of the present work is based on the AAAI General Game Playing Competition, organized by Michael Genesereth (Genesereth, Love, & Pell 2005) Introduction and the Game Description Language, or GDL (Genesereth & Love 2005). One of the challenges in automatically con- The idea of general game playing is to create a program that structing heuristic evaluation functions for games from GDL effectively plays games that it has not previously encoun- descriptions is the generality of the language. For exam- tered. A game manager program sends the game playing ple, although the game of chess is describable in GDL (with programs a description of a game in a well-defined game some minor caveats), GDL has no notion of a grid-like board description language. The description specifies the goal of or pieces that move, capture, and promote. Instead, each the game, the legal moves, the initial game state, and the of these concepts are constructed in a game-specific way in termination conditions. The game manager also sends in- first-order logic. This approach forces program authors to formation about what role the program will play (black or implement extremely general game-playing agents. white, naughts or crosses, etc), a start time (time allowed for pre-game analysis), and a move time (time allowed per move The first published work on generating heuristics for GDL once game play begins). The game playing programs com- is (Kuhlmann, Dresner, & Stone 2006). They describe a pete by sending messages over a network indicating their method of creating features from syntactic structures by rec- moves until the game is completed. The class of games cov- ognizing and exploiting relational patterns such as successor ered is intentionally broad, including games of one or more relations and board-like grids. They then perform distributed players with alternating or simultaneous moves, with arbi- search, where each machine uses a single heuristic based on trary numeric payoffs. Our research focus is the automatic either the maximization or the minimization of one of the construction of heuristic evaluation functions that enable a detected features. general game-playing program to win games. Another approach to constructing heuristic evaluation Game playing programs have been of interest to the AI functions for GDL is described in (Schiffel & Thielscher community since the field’s inception because they provide a 2007). Their approach also recognizes structures such as well-defined test-bed for theories of how computational pro- successor relations and grids in game descriptions. Beyond cesses can produce intelligent behavior. Although impres- structure recognition, they utilize fuzzy logic and techniques sive achievements have been made in a number of games, for reasoning about actions to quantify the degree to which a position satisfies the logical description of winning. They Copyright c 2007, Association for the Advancement of Artificial also assess how close the game is to termination, seeking ter- Intelligence (www.aaai.org). All rights reserved. minal states when goals are reached and avoiding terminal 1134 states when goals are not yet attained. composed of expressions consisting of fluents, logical con- The work presented here is part of an ongoing research nectives, a distinguished set of GDL relations, and game- project which has produced general game playing programs specific relations. Here we briefly describe the GDL repre- that have placed first and second in the First and Second sentation of cylinder checkers, omitting the syntax. Annual AAAI General Game Playing Competitions, respec- The role relation lists the roles of the players in the game tively. Algorithms and results presented here reflect the cur- (red and black). The init relation enumerates the set of flu- rent state of the ongoing project. ents true in the initial state of the game. These indicate the The core ideas governing our present approach are the initial board configuration and that red moves first. following. One source of a heuristic evaluation function is The legal relation indicates the legal moves for each role that of an exact solution to simplified version of the original from a given state. The next and does relations are used problem (Pearl 1984). Applying this notion to general game to describe resulting fluents in terms of existing fluents and playing, we abstract the game to its core aspects and com- player’s actions. For example, one rule is that moving a pute the exact value of the simplified game, where the value piece from cell (x, y) results in cell (x, y) being blank. of a game is interpreted as the minimax value as described in The terminal relation is used to indicate termination con- (von Neumann & Morgenstern 1953). Because this simpli- ditions for the game, in this case after 50 moves or when a fied game preserves key characteristics of the original game, player cannot move. The goal relation is used to indicate we use the exact value of the simplified game as an approxi- payoffs for terminal states. A rule says that if black has no mation of the value of the original game. The core aspects of pieces, then the payoffs are 100 for red and 0 for black. the game that we model are the expected payoff, the control (or relative mobility), and the expected game termination (or Identifying Stable Features game longevity). These core game aspects are modeled as When a human first learns the game of checkers, there functions of game-state features. The features must be au- are some obvious metrics that appear relevant to assessing tomatically selected, and the key idea guiding this selection states: the relative cardinalities of regular pieces of each is stability, an intuitive concept for which we will provide color and kings of each color. The more sophisticated player a more technical meaning. Our approach strives to evaluate may consider other factors as well, such as the number of states based on only the most stable features. openings in the back row, degree of center control, or dis- Our approach shares some aspects with previous ap- tance to promotion. We will use the term feature to refer to proaches. Like Kuhlmann, Dresner, & Stone, binary rela- a function from states to numbers that has some relevance tions are extracted from the game description. Like Schiffel to assessing game states. A feature is not necessarily a full & Thielscher, the approach incorporates a measure of degree evaluation function, but features are building-blocks from of goal attainment. Distinguishing aspects of the present which evaluation functions can be constructed. work include the simplified model of the game in terms of Consider the cues that human checkers players exploit payoff, control, and termination, and a particular notion of when learning to evaluate positions. Pieces are represented stability as a key characteristic of relevant features. as physical objects, which have properties of persistence and The remainder of this paper is as follows. We explore mobility that humans have extensive experience with. The an example game, discussing both its representation in GDL coloring of the pieces, the spatial representation of the board, and the game’s analysis. We then present preliminary results and the symmetry of the initial state each leverage strengths of automatically constructed heuristic evaluation functions of the human visual system. The GDL game description, al- for a sampling of board games, and end with discussion. though isomorphic to the physical game, has none of these properties, so feature identification is not so simple. Example: Cylinder Checkers The approach taken here is to extract expressions that ap- We illustrate GDL and a technique for constructing heuristic pear in the game description and impose interpretations on evaluation functions with a checkers variant called cylinder these expressions to construct candidate features. There are checkers. The game was introduced in the Second Annual two aspects to this: identifying a set of potentially interest- AAAI General Game Playing Competition. The primary ing candidate expressions and imposing interpretations on difference from traditional checkers is that the playing board these expressions. To identify the set of candidate expres- has a topology in which the sides wrap around to form a sions, the analyzer starts by simply scanning the game de- cylinder.