The MP-MIX Algorithm: Dynamic Search Strategy Selection in Multiplayer Adversarial Search Inon Zuckerman and Ariel Felner
Total Page:16
File Type:pdf, Size:1020Kb
316 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 3, NO. 4, DECEMBER 2011 The MP-MIX Algorithm: Dynamic Search Strategy Selection in Multiplayer Adversarial Search Inon Zuckerman and Ariel Felner Abstract—When constructing a search tree for multiplayer algorithms include the minimax search algorithm coupled with games, there are two basic approaches to propagating the oppo- the alpha–beta pruning technique [6] which is still thebasic nents’ moves. The first approach, which stems from the MaxN building block for many successful computer player implemen- algorithm, assumes each opponent will follow his highest valued heuristic move. In the second approach, the paranoid algorithm, tations. In addition, over the years many other variations of the the player prepares for the worst case by assuming the oppo- original algorithm have been suggested [12]. nents will select the worst move with respect to him. There is no When constructing a search tree for multiplayer games, there definite answer as to which approach is better, and their main are two basic approaches one can take when expanding the op- shortcoming is that their strategy is fixed. We therefore suggest ponents’ moves. The first approach, presented in the MaxN al- the MaxN-paranoid mixture (MP-Mix) algorithm: a multiplayer adversarial search that switches search strategies according to gorithm [9], assumes that each opponent will follow his highest the game situation. The MP-mix algorithm examines the current valued move. In the second approach, presented in the paranoid situation and decides whether the root player should follow the algorithm [16], the player prepares for the worst case by as- MaxN principle, the paranoid principle, or the newly presented suming the opponents will work as a coalition and will select directed offensive principle. To evaluate our new algorithm, we the worst move with respect to him. performed extensive experimental evaluation on three multi- player domains: Hearts, Risk,andQuoridor. In addition, we also A comprehensive comparison between the two algorithms introduce the opponent impact (OI) measure, which measures the was performed by Sturtevant [14]. This could not conclude a players’ ability to impede their opponents’ efforts, and show its definite answer as to which approach is better, and further claims relation to the relative performance of the MP-mix strategy. The that the answer strongly depends on the properties of the game results show that our MP-mix strategy significantly outperforms and on the evaluation function. The main weakness of these al- MaxN and paranoid in various settings in all three games. gorithms is that their underlying assumptions on opponents’ be- Index Terms—Artificial intelligence (AI), decision trees, game- havior is fixed throughout the game. However, when examining tree search, multiplayer games. the course of many games one can realize that neither of their underlying assumptions are reasonable for the entire duration of I. INTRODUCTION the game. There are situations where it is more appropriate to follow the MaxN assumption, while in other situation the para- noid assumption seems to be the appropriate approach. ROM the early days of artificial intelligence (AI) re- Our focus in this work is on multiplayer games with a single F search, game playing has been one of the prominent winner and no reward is given to the losers, i.e., they are all research directions, since outplaying a human player has been equal losers regardless of their losing position. We call these viewed as a prime example of an intelligent behavior which games single-winner games. In such multiplayer games, there surpasses human intelligence. naturally exist other possible approaches to propagate heuristic The main building block of game playing engines is the values, that is besides MaxN and paranoid. In this paper, we in- adversarial search algorithm, which defines a search strategy troduce one such new approach, denoted the offensive strategy. for the action selection operation among the possible actions In single-winner, multiplayer games, there are situations where a player can take. In general, two-player adversarial search one player becomes stronger than the others and advances to- algorithms have been an important building block in the wards a winning state. Such situations, together with an un- construction of strong players, sometimes optimal or world derstanding that there is no difference whether a loser finishes champions [4], [13]. Classical two-player adversarial search second or last (as only the winner gets rewarded), should trigger the losing players to take explicit actions in order to prevent the leader from winning, even if the actions temporarily worsen Manuscript received September 24, 2010; revised February 25, 2011 and July their own situation. Moreover, in some situations, the only way 01, 2011; accepted July 28, 2011. Date of publication September 26, 2011; date of current version December 14, 2011.The work of A. Felner was supported by for individual players to prevent the leader from winning is by the Israeli Science Foundation under Grant 305/09. forming a coalition of players. This form of reasoning should I. Zuckerman is with the Department of Industrial Engineering and Man- lead to a dynamic change in the search strategy to an offensive agement, Ariel University Center of Samaria, Ariel 44837, Israel (e-mail: in- [email protected]). strategy, in which the player selects the actions that worsen the A. Felner is with the Information Systems Engineering Department, Ben-Gu- situation of the leading player. At the same time, the leading rion University, Be’er-Sheva 85104, Israel (e-mail: [email protected]). player can also understand the situation and switch to a more Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. defensive strategy and use the paranoid approach, as its under- Digital Object Identifier 10.1109/TCIAIG.2011.2166266 lying assumption does reflect the real game situation. 1943-068X/$26.00 © 2011 IEEE ZUCKERMAN AND FELNER: THE MP-MIX ALGORITHM: DYNAMIC SEARCH STRATEGY SELECTION 317 All these approaches (MaxN, paranoid, and offensive) are fixed. We introduce the MaxN-paranoid mixture (MP-mix) algorithm, a multiplayer adversarial search algorithm which switches search strategies according to the game situation. MP-mix is a meta-decision algorithm that outputs, according to the players’ relative strengths, whether the player should con- duct a game-tree search according to the MaxN principle, the paranoid principle, or the newly presented directed offensive principle. Thus, a player using the MP-mix algorithm will be able to change his search strategy dynamically as the game de- velops. To evaluate the algorithm we implemented the MP-mix algorithm on three single-winner and multiplayer domains: 1) Hearts—an imperfect-information, deterministic game; 2) Risk—a perfect-information, nondeterministic game; Fig. 1. Three-player MaxN game tree. 3) Quoridor—a perfect-information, deterministic game. Our experimental results show that in all domains, the MP-mix approach significantly outperforms the other ap- them measures the merit of one of the players. The root player proaches in various settings, and its winning rate is higher. chooses a move towards the leaf whose value was propagated all However, while the performance of MP-mix was significantly the way up to the root (usually denoted the principal leaf). When better in Risk and Quoridor, the results for Hearts were less propagating values, the common assumption is that the oppo- impressive. nents will use the same evaluation function as the root player In order to explain the different behavior of MP-mix we in- (unless using some form of specific opponent modeling-based troduce the opponent impact (OI). The OI is a game specific algorithm such as the ones found in [1] and [17]). property that describes the impact of move decisions by a single In sequential two-player zero-sum games (where players al- player on the performance of other players. In some games, the ternate turns), one evaluation value is enough, assuming one possibilities to impede the opponents are limited. Extreme ex- player aims to maximize this value while the other player aims amples are the multiplayer games of Bingo and Yahtzee.Inother to minimize it. The evaluation value is usually the difference games, such as Go Fish, the possibilities to impede the opponent between the merits of the max player and the merit of the min almost always exist. We show how OI can be used to predict player. Values from the leaves are propagated according to the whether dynamically switching the search strategies and using well-known minimax principle [18]. That is, assuming the root the MP-mix algorithm are beneficial. Our results suggest a pos- player is a maximizer, in even (odd) levels, the maximum (min- itive correlation between the improvement of the MP-mix ap- imum) evaluation value among the children is propagated. proach over previous approaches and games with high OI. Sequential, turn-based, multiplayer games with The structure of the paper is as follows. Section II provides players are more complicated. The assumption is that for each the required background on the relevant search techniques. In node the evaluation function returns a vector of evaluation Section III, we present the newly suggested directed offensive values where estimates the merit of player .Twobasic search strategy and the MP-mix algorithm. Section IV presents approaches were suggested to generalize the minimax principle our experimental results in three domains. The OI is introduced to this case: MaxN [9] and paranoid [16]. and discussed in Section V. We conclude in Section VI and present some ideas for future research in Section VII. This A. MaxN paper extends a preliminary version that appeared in [19] by presenting experimental results in the Quoridor domain, The straightforward and classic generalization of the two- new experimental insights on the behavior of MP-mix, new player minimax algorithm to the multiplayer case is the MaxN theoretical properties, and by extending the discussions in all algorithm [9].