Evolving Opponent Models for Texas Hold ’Em

Evolving Opponent Models for Texas Hold ’Em Alan J. Lockett and Risto Miikkulainen Abstract —Opponent models allow software agents to assess the opponent’s behavior provides the primary window into a multi-agent environment more accurately and therefore the opponent’s state. Further, the natural incentives for improve the agent’s performance. This paper makes use of deception inherent in the game make this window a rather coarse approximations to game-theoretic player opaque one, and thus any method that can effectively representations to improve the performance of software decipher a poker player’s actions in order to obtain players in Limit Texas Hold ’Em poker. A 10-parameter model, intended to model a combination, or mixture, of various therefrom a reasonable and useful model of the opponent strategies is developed to represent the opponent. A ‘mixture should generalize well to other environments. identifier’ is then evolved using the NEAT neuroevolution There are two fundamentally different approaches to method to estimate values of these parameters for arbitrary opponent modeling in poker and other similar games. On opponents. To evaluate this approach, two poker players, the one hand, one might seek a direct predictive model that represented as neural networks, were evolved under the same conditions, one with the mixture identifier, and one without. would construct some probability distribution over the The player trained with access to the identifier achieved future actions or current state of the agent being modeled. A consistently higher and more stable fitness during evolution model of this sort might also be used to estimate hidden compared with the player without the identifier. Further, the state in situations where another agent in the environment player with the identifier outplays the other in a heads-up has access to information only available to the observer match after training, winning on average 60% of the money at through that agent’s actions. In the context of poker, this the table. These results demonstrate that opponent modeling is effective even with low-dimensional models and conveys an approach might involve estimating the cards likely present advantage to players trained to use these models. in the opponent’s hand or perhaps attempting to guess whether or not the opponent is bluffing or slowplaying on a I. INTRODUCTION particular hand. Modeling opponent actions or state in this An increasing number of applications of artificial fashion is transparent and immediately useful; that is, the intelligence require the ability to build and maintain output has a known interpretation that impinges directly on computational models of autonomous agents. Autonomous the decision-making problem at hand. However, previous vehicles must be able to construct accurate models of work indicates that predictive models may be unstable, and agents in their environment quickly so that they can respond it is not clear a priori how to build such a model in practice adaptively. Software agents managing financial transactions [1, 2, 3]. must be able to identify fraudulent behavior in a timely Another approach, and the one pursued in this paper, manner. As computer programs are increasingly placed in would attempt to classify the opponent by type, either as the role of a decision maker, computational methods are belonging to a specific class out of some discrete set of increasingly needed to analyze the motives and intent of the categories, or as a point (or region) within a continuous agents with which they interact. description space. Whereas a predictive model tries to Within the field of artificial intelligence, games have identify what the opponent will do next, a classification traditionally provided a ready test bed for new ideas and model will attempt to identify what the opponent is like by approaches to difficult decision-making problems, since analogy with previously observed opponents. Intuitively, games allow for testing in complex and ever more realistic this concept resembles how people approach game strategy, environments with relatively low start-up costs and little by identifying opponents in terms of past experience, and risk to human safety. Within AI, poker is perhaps the most reasoning forward from these analogies to anticipate appropriate target for opponent modeling, since identifying opponent action, in essence using a classification model as optimal strategies for poker has proven elusive. In poker, a means to obtain a predictive one. In poker, a common categorization (although not used in this paper) might be to identify the strategy of the opponent as tight or loose, and Alan Lockett is with the Department of Computer Sciences at the passive or aggressive. One drawback of this approach is University of Texas, Austin, TX 78712 USA (e-mail: [email protected] ). that classification models are not necessarily transparent or Risto Miikkulainen is with the Department of Computer Sciences, direct; i.e. there may be no obvious interpretation that can University of Texas, Austin, TX 78712 USA (e-mail: be given from the classification output to game decisions. [email protected] ). For statistical methods (including neuroevolution), poker. There are only five non-dominated strategies: three however, transparency is irrelevant, since these methods for the first player and two for the second. These opponent cannot take into account the intensional semantics of the models represent mixed strategies, a feature held in model per se. Classification models are simpler to construct common with this current work. and learn than predictive models in practice [3]. While it In substance, the mixture method of this paper shares may not seem clear from the outset how to select useful broad similarities with what Bard and Bowling term ‘static’ opponent categories or how to assign actual opponents to opponent models. In fact, it provides a first approach to a these categories on-line, it is in fact quite feasible, as will reasonably-sized approximation of complete opponent be shown later in this paper. models for poker, a development which they suggest as a This paper demonstrates the successful application of a next step. It differs significantly, however, in how the continuous classification approach to opponent modeling in mixture models are used once obtained. Rather than trying Texas Hold ’Em poker. The models use a coarse to solve explicitly for a mixed strategy to exploit the approximation to game-theoretic opponent representations opponent, this work uses neuroevolution in order to search to provide a parameterized description of a large subclass for effective game players. This is a fundamentally distinct of poker players. It is important to note that in this research, methodology, based on the point of view that the opponent classification is performed in a continuous space rather than models will invariably have some stochastic bias or error over some discrete set of opponents. Evolutionary that is best handled by using stochastic methods for algorithms are shown to be able to train a neural network to interpreting them. Exact computations based on the reliably associate a model with an adversary. In addition, outcome of the mixture identification problem could lead a poker players are trained using neuroevolution both with computer player astray, turning an attempt to exploit the and without access to the estimated models. The players opponent’s weaknesses into a trap. An algorithm that makes with the models conclusively outperform the players use of stochastic measurements should also be able to without models in three distinct aspects: (1) they attain assess and mitigate risk. An exact computational method higher maximum and average fitness under the same fitness affords no such flexibility. function, (2) their fitness is more stable, i.e. it varies less The mixture approach to opponent modeling is based on across generations, and (3) they routinely outplay the non- that applied by Lockett et al. [3] in a simpler card game opponent modeling players in heads-up matches after called Guess It. In their approach, a set of four cardinal equivalent training. opponents was used to define an opponent space , with all possible opponents being represented as probability II. RELATED WORK distributions over these four basic opponents. These A significant body of opponent modeling work has been distributions were termed mixture opponents , and at each done in poker, much of it using an explicit approach [4, 5]. turn, the distribution was sampled to decide which of the For instance, Billings et al . [1] used statistical methods to four cardinal opponents would make the decision for that estimate the strength of the opponent’s hand given his turn. A mixture identifier was trained to estimate the history of calling, raising, or folding. They also developed sampling distribution of a mixture opponent from the predictive models to assess what decision a specific current game state. Using neuroevolution, Lockett et al. opponent would make when holding a given hand. were able to train the mixture identifier to an accuracy of Although the original predictor was only 51% accurate, about 85 percent. Two separate neural networks were then Davidson et al . [2], [6] used a neural network trained with trained, one of which took in the game state plus the output backpropagation to increase its accuracy to 81%. These of the mixture identifier, and another that took in the game data are especially interesting since their poker player, state only. While the network with only the game state Loki, gathered statistics on actual human players by playing achieved greater fitness against the mixture opponents, the online poker games. However, the approach required a authors found that the networks with both the games state significant history of data for training and therefore could and the mixture identifier consistently won against a bank not be used online. In contrast, the mixture-based approach of previously unseen players, including the network with does not require additional training in order to generalize to just the game state.

Load more