Interactive Fiction Game Playing As Multi-Paragraph Reading Comprehension with Reinforcement Learning

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning Xiaoxiao Guo⇤ Mo Yu⇤ Yupeng Gao IBM Research IBM Research IBM Research [email protected] [email protected] [email protected] Chuang Gan Murray Campbell Shiyu Chang MIT-IBM Watson AI Lab IBM Research MIT-IBM Watson AI Lab [email protected] [email protected] [email protected] Abstract Interactive Fiction (IF) games with real human- written natural language texts provide a new natural evaluation for language understanding techniques. In contrast to previous text games with mostly synthetic texts, IF games pose language understanding challenges on the human- written textual descriptions of diverse and sophisticated game worlds and language generation challenges on the action command generation from less restricted combinatorial space. We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Read- ing Comprehension (MPRC) tasks. Our approaches utilize the context-query attention mechanisms and the structured prediction in MPRC to efficiently generate and evaluate action outputs and apply an object-centric historical observation retrieval strategy to miti- gate the partial observability of the textual ob- Figure 1: Sample gameplay for the classic dungeon game servations. Extensive experiments on the re- Zork1. The objective is to solve various puzzles and collect cent IF benchmark (Jericho) demonstrate clear the 19 treasures to install the trophy case. The player receives advantages of our approaches achieving high textual observations describing the current game state and additional reward scalars encoding the game designers’ objective winning rates and low data requirements com- 1 of game progress. The player sends textual action commands pared to all previous approaches. to control the protagonist. 1 Introduction natural language command (action) via a text in- Interactive systems capable of understanding natu- put interface. Without providing an explicit game ral language and responding in the form of natural strategy, the agents need to identify behaviors that language text have high potentials in various appli- maximize objective-encoded cumulative rewards. cations. In pursuit of building and evaluating such IF games composed of human-written texts (dis- systems, we study learning agents for Interactive tinct from previous text games with synthetic texts) Fiction (IF) games. IF games are world-simulating create superb new opportunities for studying and software in which players use text commands to evaluating natural language understanding (NLU) control the protagonist and influence the world, as techniques due to their unique characteristics. (1) illustrated in Figure 1. IF gameplay agents need Game designers elaborately craft on the literari- to simultaneously understand the game’s informa- ness of the narrative texts to attract players when tion from a text display (observation) and generate creating IF games. The resulted texts in IF games are more linguistically diverse and sophisticated ⇤ Primary authors. 1Source code is available at: https://github.com/ than the template-generated ones in synthetic text XiaoxiaoGuo/rcdqn. games. (2) The language contexts of IF games 7755 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 7755–7765, November 16–20, 2020. c 2020 Association for Computational Linguistics Observation! Action Value Prediction: RC-model based • <pick up eggs>: 0.01 Object-based past observation-action • <pick up branches>: 0.01 Action! observation retrieval • … … value approximator • <break window with stone>: 0.4 identified retrieved past • <break window with knife>: 0.4 objects observations observations templates Observation"#$ as context as queries Object Action extraction Action"#$ Observations Template List: selection • <pick up OBJ> • <east> Observation" • … Action! = <break • <break OBJ with OBJ> window with stone> (a) Multi-Passage Retrieval for Partial Observability (b) Multi-Passage RC for Action Value Learning Figure 2: Overview of our approach to solving the IF games as Multi-Paragraph Reading Comprehension (MPRC) tasks. are more versatile because various designers con- information to determine the long-term effects of tribute to enormous domains and genres, such as actions. Previous approaches address this problem adventure, fantasy, horror, and sci-fi. (3) The text by building a representation over past observations commands to control characters are less restricted, (e.g., building a graph of objects, positions, and spa- having sizes over six orders of magnitude larger tial relations) (Ammanabrolu and Riedl, 2019; Am- than previous text games. The recently introduced manabrolu and Hausknecht, 2020). These methods Jericho benchmark provides a collection of such IF treat the historical observations equally and sum- games (Hausknecht et al., 2019a). marize the information into a single vector without The complexity of IF games demands more so- focusing on important contexts related to the action phisticated NLU techniques than those used in syn- prediction for the current observation. Therefore, thetic text games. Moreover, the task of designing their usages of history also bring noise, and the IF game-play agents, intersecting NLU and rein- improvement is not always significant. forcement learning (RL), poses several unique chal- We propose a novel formulation of IF game lenges on the NLU techniques. The first challenge playing as Multi-Passage Reading Comprehension is the difficulty of exploration in the huge natural (MPRC) and harness MPRC techniques to solve language action space. To make RL agents learn the huge action space and partial observability efficiently without prohibitive exhaustive trials, the challenges. The graphical illustration is shown in action estimation must generalize learned knowl- Figure 2. First, the action value prediction (i.e., edge from tried actions to others. To this end, pre- predicting the long-term rewards of selecting an vious approaches, starting with a single embedding action) is essentially generating and scoring a com- vector of the observation, either predict the ele- positional action structure by finding supporting ments of actions independently (Narasimhan et al., evidence from the observation. We base on the fact 2015; Hausknecht et al., 2019a); or embed each that each action is an instantiation of a template, valid action as another vector and predict action i.e., a verb phrase with a few placeholders of object value based on the vector-space similarities (He arguments it takes (Figure 2b). Then the action et al., 2016). These methods do not consider the generation process can be viewed as extracting ob- compositionality or role-differences of the action jects for a template’s placeholders from the textual elements, or the interactions among them and the observation, based on the interaction between the observation. Therefore, their modeling of the ac- template verb phrase and the relevant context of tion values is less accurate and less data-efficient. the objects in the observation. Our approach ad- The second challenge is partial observability. dresses the structured prediction and interaction At each game-playing step, the agent receives a tex- problems with the idea of context-question atten- tual observation describing the locations, objects, tion mechanism in RC models. Specifically, we and characters of the game world. But the latest treat the observation as a passage and each tem- observation is often not a sufficient summary of plate verb phrase as a question. The filling of ob- the interaction history and may not provide enough ject placeholders in the template thus becomes an 7756 extractive QA problem that selects objects from was proposed to generate verb-object action with the observation given the template. Simultaneously pre-defined sets of possible verbs and objects, but each action (i.e., a template with all placeholder treat the selection and learning of verbs and objects replaced) gets its evaluation value predicted by the independently. Template-DQN (Hausknecht et al., RC model. Our formulation and approach better 2019a) extended LSTM-DQN for template-based capture the fine-grained interactions between ob- action generation, introducing one additional but servation texts and structural actions, in contrast still independent prediction output for the second to previous approaches that represent the observa- object in the template. Deep Reinforcement Rel- tion as a single vector and ignore the fine-grained evance Network (DRRN) (He et al., 2016) was dependency among action elements. introduced for choice-based games. Given a set of Second, alleviating partial observability is es- valid actions at every game state, DRRN projects sentially enhancing the current observation with each action into a hidden space that matches the potentially relevant history and predicting actions current state representation vector for action se- over the enhanced observation. Our approach re- lection. Action-Elimination Deep Q-Network (AE- trieves potentially relevant historical observations DQN) (Zahavy et al., 2018) learns to predict invalid with an object-centric approach (Figure 2a), so that actions in the adventure game Zork. It eliminates the retrieved ones are more likely to be connected to invalid action for efficient policy learning via uti- the current observation as they describe at least one lizing expert demonstration data. shared interactable object. Our attention

Interactive Fiction Game Playing As Multi-Paragraph Reading Comprehension with Reinforcement Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support