Efficient Belief-State AND–OR Search, with Application to Kriegspiel
Total Page:16
File Type:pdf, Size:1020Kb
Efficient belief-state AND–OR search, with application to Kriegspiel Stuart Russell and Jason Wolfe Computer Science Division University of California, Berkeley, CA 94720 [email protected], [email protected] Content areas: Game-playing, uncertainty Abstract This follows from the fact that an opponent, by simply choosing moves at random, has a nonzero probability of du- The paper reports on new algorithms for solving plicating any finite behavior of an optimal opponent with full partially observable games. Whereas existing al- observation. Thus, the logical possibility that the solver can gorithms apply AND-OR search to a tree of black- guarantee a win within some bounded depth depends only box belief states, our “incremental” versions treat upon the solver’s own information about the true state. This uncertainty as a new search dimension, examining implies that neither the opponent’s information state nor ran- the physical states within a belief state to construct domized strategies need be considered. We will see shortly solution trees incrementally. On a newly created that the theorem does not apply to certain other strategies, database of checkmate problems for Kriegspiel (a which win with probability 1 but are not guaranteed. partially observable form of chess), incrementaliza- Despite these restrictions, the problem is quite general. tion yields speedups of two or more orders of mag- Indeed, it is isomorphic to that of finding guaranteed plans nitude on hard instances. in nondeterministic, partially observable environments. Our particular focus in this paper is on Kriegspiel, a variant of 1 Introduction chess in which the opponent’s pieces are completely invisi- “Classical” games, such as chess and backgammon, are fully ble. As the game progresses, partial information is supplied observable. Partially observable games, despite their greater by a referee who has access to both players’ positions. The similarity to the real world, have received less attention in players, White1 and Black, both hear every referee announce- AI. The overall computational task in such games can be di- ment. There is no universally accepted set of rules; we adopt vided conceptually into two parts. First, state estimation is the following: the process of generating and updating the belief state—a • White may propose to the referee any move that would representation of the possible states of the world, given the be a legal chess move on a board containing just White’s observations to date. Second, move selection is the process of pieces; White may also propose any pawn capture. choosing a move given the belief state. • If the proposed move is illegal on the board containing Partially observable games are intrinsically more com- both White and Black pieces, the referee announces “Il- plicated than their fully observable counterparts. At any legal.”2 White may then try another move. given point, the set of logically possible states may be • very large, rendering both state estimation and move selec- If the move is legal, it is made. Announcements are: tion intractable. Furthermore, move selection must consider – If a piece is captured on square X: “Capture on X.” one’s own information state (gathering information is help- – If Black is now in check: “Check by D,” where D ful) and the opponent’s information state (revealing informa- is one or two of the following directions (from the tion is harmful). Finally, game-theoretically optimal strate- perspective of Black’s king): Knight, Rank, File, gies are randomized because restricting oneself to determinis- Long Diagonal, and Short Diagonal. tic strategies provides additional information to the opponent. – If Black has no legal moves: “Checkmate” if Black The paper by Koller and Pfeffer [1997] provides an excellent is in check and “Stalemate” otherwise. introduction to these issues. • Finally: “Black to move.” In this paper, we focus on a particular subproblem: de- (Examples of play are given in Section 4. In addition, ciding whether a guaranteed win exists—i.e., a strategy that more details are available online at our Kriegspiel website, guarantees the optimal payoff within some finite depth, re- www.cs.berkeley.edu/˜jawolfe/kriegspiel/.) gardless of the true state, for any play by the opponent. This One important aspect of Kriegspiel and many other par- is simpler than the general problem, for two reasons. First, tially observable games and planning domains is the “try- the state estimate need represent only the logically possible until-feasible” property—a player may attempt any number states, without considering their probabilities. Second, we have the following: 1W.l.o.g., we assume it is White’s turn to move. Theorem 1 A guaranteed win exists from some initial belief 2If the move is not legal for White’s pieces alone, or if it has state iff it exists against an opponent with full observation. already been rejected on this turn, the referee says “Nonsense.” of actions until a legal one is found. Because the order of at- selection, and explains how self-play was used to generate the tempts matters, a move choice is actually a plan to try a given first database of Kriegspiel checkmate problems with obser- sequence of potentially legal moves. Hence, the branching vation histories. factor per actual move is the number of such sequences, Section 4 develops the basic AND–OR tree structure for which can be superexponential in the number of potentially solving Kriegspiel-like games, allowing for possibly-illegal legal moves. We will see that try-until-feasible domains ad- moves. Section 5 defines two baseline algorithms—versions mit certain simplifications that mitigate this branching factor. of depth-first search (DFS) and proof-number search (PNS)— Kriegspiel is very challenging for humans. Even for ex- for solving such trees, and then presents some basic improve- perts, “announced” checkmates are extremely rare except ments for these algorithms and analyzes their performance on for so-called material wins—such as KQR-vs.-K—when one our checkmate database. side is reduced to just a king and the other has sufficient (safe) Section 6 develops a new family of incremental search al- material to force a win. Checkmate is usually “accidental” in gorithms that treat uncertainty as a new search dimension in the sense that one player mates the other without knowing it addition to depth and breadth, incrementally proving belief in advance. states by adding a single constituent “physical state” to a so- Several research groups have studied the Kriegspiel check- lution at each step. This leads to a further speedup of one mate problem. Ferguson [1992] exhibited a randomized or more orders of magnitude. Finally, Section 7 shows how strategy for the KBN-vs.-K endgame that wins with prob- state estimation and incremental checkmate search may be ability 1; the lone Black king can escape checkmate only interleaved. by guessing White’s moves correctly infinitely often. Sub- sequently [1995], he derived a strategy for the KBB-vs.-K 2 State estimation endgame that wins with probability 1−ǫ for any ǫ > 0. In our terminology, these mates are not guaranteed, and they do not If we are to find a guaranteed win for White, state estimation work if the opponent can see the board. Additional material- must identify the (logical) belief state—the set of all physical win strategies, all deterministic, have been developed [Cian- states (configurations for White and Black pieces) that are carini et al., 1997; Bolognesi and Ciancarini, 2003; 2004]. consistent with White’s history of moves and observations. A As we explain in Section 4, algorithms for finding a check- naive algorithm for exact state estimation looks like this: mate in Kriegspiel involve searching an AND–OR tree whose • The initial belief state is a singleton, because Black’s nodes correspond to belief states. This idea is common pieces start the game in their normal positions. enough in the AI literature on partially observable plan- • For each White move attempt, apply the move to every ning search (see, for example, Chapters 3 and 12 of [Rus- physical state and remove those states that are inconsis- sell and Norvig, 2003]). It was proposed for Kriegspiel, tent with the subsequent percept. and applied to the analogous partially observable variant of • For each Black turn, Shogi (Japanese chess), by Sakuta and Iida; results for sev- eral search algorithms are summarized in Sakuta’s PhD the- – For each sequence of k “Illegal” percepts for sis [2001]. Bolognesi and Ciancarini [2004] add heuristic the unobserved Black move attempts, remove any measures of “progress” to guide the tree search, but consider physical state for which there are fewer than k dis- only positions in which the opponent has a lone king. In all of tinct illegal moves. these papers, the initial belief state is determined externally. – Replace each remaining physical state by the set The search algorithm developed by Ginsberg [1999] for of updated states corresponding to all Black moves (partially observable) Bridge play works by sampling the ini- that are legal and yield the given percept when tial complete deal and then solving each deal as a fully ob- made in that state. servable game. This approach gives a substantial speedup – Remove duplicate states using a transposition table. 10 over solving the true game tree, but never acts to gather In Kriegspiel, the belief state can grow very large—10 or hide information (which is essential for domains such as states or more—so this algorithm is not always practical. We Kriegspiel). The only previous Kriegspiel-playing agent we have found, however, that with certain aggressive styles of know of, developed by Parker et al. [2005], uses this ap- play, the belief state remains no larger than a few thousand proach. It keeps track of a sample of its true belief state, states throughout the game.