Monte Carlo Tree Search Techniques in the Game of Kriegspiel Paolo Ciancarini Gian Piero Favini Dipartimento di Scienze dell’Informazione Dipartimento di Scienze dell’Informazione University of Bologna University of Bologna [email protected] [email protected] Abstract do not yield good results due to the size of the state space and the difficulty of crafting an adequate evaluation function. Monte Carlo tree search has brought significant The game of Go is the primary example, albeit not the only improvements to the level of computer players in one, of a tough environment for minimax where Monte Carlo games such as Go, but so far it has not been used tree search was able to improve the level of computer players very extensively in games of strongly imperfect in- considerably. Since Kriegspiel shares the two traits of being formation with a dynamic board and an emphasis a large game and a difficult one to express with an evaluation on risk management and decision making under un- function (unlike its perfect information counterpart), it is only certainty. In this paper we explore its application to natural to test a similar approach. This would also allow to re- the game of Kriegspiel (invisible chess), providing duce the amount of game-specific knowledge used by current three Monte Carlo methods of increasing strength programs by a large amount. for playing the game with little specific knowl- The paper is organized as follows. In Section 2, we intro- edge. We compare these Monte Carlo agents to the duce the game of Kriegspiel, its rules, and the most signifi- strongest known minimax-based Kriegspiel player, cant research results obtained in the field. In Section 3, we obtaining significantly better results with a con- provide a few highlights in the history of Monte Carlo tree siderably simpler logic and less domain-specific search methods applied to games, with an emphasis on Phan- knowledge. tom Go, which is to Go what Kriegspiel is to chess. We then describe our Monte Carlo approaches in Section 4, showing 1 Introduction how we built three Monte Carlo Kriegspiel players of increas- Imperfect information games provide a good model and ing strength. Section 5 contains experimental tests comparing testbed for many real-world problems and situations involv- strength and performance of the various programs. Finally, ing decision making under uncertainty. They typically in- we give our conclusions and furure research directions in Sec- volve a combination of complex tasks such as heuristic tion 6. search, belief state reconstruction and opponent modeling, and they can be very difficult for a computer agent to play 2 Kriegspiel well. Some games are particularly challenging because at Kriegspiel, named after the ‘war game’ used by the Prussian any time, the number of possible, indistinguishable states far army to train its officers, is a chess variant invented at the end exceeds the storage and computational abilities of present- of the XIX century by Michael Henry Temple. It is played day computers. In this paper, the focus is on one such game, on three chessboards in different rooms, one for either player Kriegspiel or invisible chess. It has several features that make and one for the umpire. From the umpire’s point of view, a it interesting: firstly, its rules are identical to those of a very game of Kriegspiel is a game of chess. The players, however, well-known game and only the players’ perception of the can only see their own pieces and communicate their move board is different, only being able to see their own pieces; attempts to the umpire, so that there is no direct communica- secondly, it is a game with a huge number of states and lim- tion between them. If a move is illegal, the umpire will ask ited means of acquiring information; and finally, the nature the player to choose a different one. If it is legal, the umpire of uncertainty is entirely dynamic. This differs from other will instead inform both players as to the consequences of games such as Phantom Go or Stratego, wherein a newly dis- that move, if any. This information depends on the Kriegspiel covered piece of information remains valid for the rest of the variant being played; on the Internet Chess Club, which has game. Information in Kriegspiel is scarce, precious and ages the largest player base for this game, it consists of the follow- fast. ing. In this paper we present the first full application of Monte Carlo tree search to the game of Kriegspiel. Monte Carlo tree • When something is captured: in this case the umpire will search has been imposing itself over the past years as a ma- say whether the captured chessman is a pawn or another jor tool for games in which traditional minimax techniques piece and where it was captured, but in the latter case he 474 will not say what kind of piece. several Monte Carlo approaches to Phantom Go, with or with- [ • When the king of the player to move is in check from out tree search, has recently been given in Borsboom et al., ] one or more directions among the following: rank, file, 2007 . Given the success of these methods, it is reasonable short diagonal, long diagonal, knight. to wonder how similar methods would perform in a game with such high uncertainty as Kriegspiel. On the other hand, • When the player to move has one or more capturing there are several differences worth mentioning between the moves using his pawns (“pawn tries”). two games. • When the game is over. • The nature of Kriegspiel uncertainty is completely dy- Because the information shared with the players is rela- namic: while Go stones are, if not immutable, at least tively little, the information set for Kriegspiel is huge. If largely static and once discovered permanently decrease one considers the number of distinct belief states in a game uncertainty by a large factor, information in Kriegspiel of Kriegspiel, the KRK (king and rook versus king) ending ages and quickly becomes old. One needs to consider alone is not very far from the whole game of checkers. How- whether uncertainty means the same thing in the two ever, many of these states are in practice equivalent since games, and whether Kriegspiel is a harsher battlefield there is no strategy that allows to distinguish among them in this respect. in a normal game. This complexity is the primary reason • There are several dozen combinations of messages that why, for a long time, research only focused on algorithms the Kriegspiel umpire can return, compared to just two for specific endings, such as KRK, KQK or KPK. Ferguson in Phantom Go. This makes their full representation in showed algorithms for solving KBNK and KBBK under cer- the game tree very difficult. tain circumstances; see, for example, [Ferguson, 1992].It was only recently that the computer actually started to play • In Phantom Go there always exists a sequence of ille- Kriegspiel. State reconstruction routines are the main object gal moves that will reveal the full state of the game and of [Nance et al., 2006]; [Russell and Wolfe, 2005] focuses on remove uncertainty altogether; no such thing exists in efficient belief state management in order to recognize and Kriegspiel, where no sequence of moves can ever reveal find Kriegspiel checkmates; [Parker et al., 2005] is the first the umpire’s chessboard except near the end of the game. Monte Carlo approach to Kriegspiel, using and maintaining a • Uncertainty grows faster in Phantom Go, but also de- state pool that is sampled and evaluated with a chess function. creases automatically in the endgame. By contrast, Finally, [Ciancarini and Favini, 2007] is based on ideas first Kriegspiel uncertainty only decreases permanently when applied to Shogi in [Sakuta, 2001] with the addition of eval- a piece is captured, which is rarely guaranteed to happen. uation functions. A similar program to the one described in • [Ciancarini and Favini, 2007], with some improvements, will In Phantom Go, the player’s ability to reduce uncertainty be used as a benchmark for the players in the present work. increases as the game progresses since there are more enemy stones, but the utility of this additional informa- tion often decreases because less and less can be done 3 Related work about it. It is exactly the opposite in Kriegspiel: much Approaches to imperfect information games can be summar- like in Battleship, since there are fewer enemies on the ily divided into two broad categories: those which consider board and fewer allies to hit them with, the player has individual possible game states, and those which reason on a harder time making progress, but any information can an abstract model that is unable to see each single possible give him a major advantage. [ ] state. As far as Kriegspiel is concerned, Parker et al., 2005 • Finally, there are differences carried over from their per- is an example of the former type, being a Monte Carlo ap- fect information counterparts, most notably the victory proach (albeit a very different one from Monte Carlo tree conditions. Kriegspiel is about causing an event that can search) based on running possible states through a chess en- happen suddenly and at almost any time, whereas Go [ ] gine. On the other hand, Ciancarini and Favini, 2007 is an games are concerned with the accumulation of score. example of the latter approach, reducing the game to an ab- stract model and essentially treating it as a perfect informa- Because of these differences, the resemblance with Phan- tion game. Ever since the introduction of the UCT algorithm tom Go might be more superficial than it can appear at a first in [Kocsis and Szepesvari, 2006], Monte Carlo programs of glance and adds further motivation to see if analogous meth- the first type have been more and more successful over the ods can work on both games despite their differences.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-