Arxiv:2001.01004V4 [Cs.RO] 13 Jan 2021

Teach Me What You Want to Play: Learning Variants of Connect Four through Human-Robot Interaction Ali Ayub and Alan R. Wagner The Pennsylvania State University, State College, PA, 16802, USA faja5755,[email protected] Abstract. This paper investigates the use of game theoretic representations to represent and learn how to play interactive games such as Connect Four. We combine aspects of learning by demonstration, active learning, and game theory allowing a robot to leverage its developing representation of the game to conduct question/answer sessions with a person, thus filling in gaps in its knowledge. The paper demonstrates a method for teaching a robot the win conditions of the game Connect Four and its variants using a single demonstration and a few trial examples with a question and answer session led by the robot. Our results show that the robot can learn arbitrary win conditions for the game with little prior knowledge of the win conditions and then play the game with a human utilizing the learned win conditions. Our experiments also show that some questions are more important for learning the game's win conditions. We believe that this method could be broadly applied to a variety of interactive learning scenarios.1 1 Introduction The objective of our larger research program is to develop the computational underpinnings and algorithms that will allow a robot to learn how to play an interactive game such as Uno, Monopoly, or Connect Four by interacting with a child. We are motivated by potential applications in hospitals and long-term care facilities for children. Moreover, playing interactive games such as these has been shown to contribute to social development [8]. Our intent is to create the underlying theory and algorithms that will allow a child to teach a robot to play the games that the child wishes to play. These games may contain nuanced and arXiv:2001.01004v4 [cs.RO] 13 Jan 2021 individualized rules that change and vary each time the game is played or with each child, yet maintain the same underlying basic structure. We borrow computational representations from game theory to address this problem. Game theory has been used to formally represent and reason about a number of interactive games such as Snakes and Ladders, Tic-Tac-Toe, and ver- sions of Chess [7]. Game theory offers a collection of mathematical tools and representations that typically examine questions of strategy during an interaction 1 The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-62056-1 42 2 Ayub, A. and Wagner, A. R. or series of interactions. The term game is used to describe the computational representation of an interaction or series of interactions. Game theory provides a variety of different representations, but the two most common representations are the normal-form game and the extended-form game (described in greater detail below). We use the term "interactive game" to indicate a series of interactions that happen through a board, cards, or play style which has predefined rules, actions, winners and losers. Given this terminology, game theory provides computational representations (games) that can be used to represent interactive games. Using representations from game theory has advantages and disadvantages. On the positive side, game theoretic representations have been designed to cap- ture the information needed to formally represent an interaction. Moreover, rep- resenting interactions as game-theoretic games allows one to apply the tools and results from game theory as needed [11]. For example, calculating Nash equilib- rium to influence one's play. On the other hand, game-theoretic representations are not easily learned solely from data [9]. This paper focuses on developing the computational underpinnings necessary for a robot to play the game Connect Four and its variants. In our prior work [2], we made some initial progress towards this goal by showing a robot that can learn the four win conditions of Connect Four. This paper focuses on developing formal underpinnings necessary for the robot to not only learn the four win conditions of Connect Four but also its variants. We further analyze our approach in this paper and quantitatively evaluate the importance of different question types for learning the variants of Connect Four. We believe that the methods developed in this paper will also work for other games and hope to show the general applicability of these techniques in future work. We seek to develop a system that learns how to play the game by asking people questions about the game. We assume that the robot knows what the game pieces are and how to use them. The focus of this paper is thus on the robot learning the win conditions for the game (i.e. how to win). Our approach leverages the robot's developing representation of the game to guide active learning. Specifically, an evolving game tree indicates to the robot the questions that it must ask in order to gain enough knowledge about the structure of the game to be able play it. Often when one person teaches another person how to play a game they begin by explaining how one wins. Our goal is to develop the computational underpinnings that will allow the robot to learn the win conditions well enough to begin playing, even if the full structure of the game has not been learned2. The main contributions of this paper are: 1. A novel approach that utilizes the evolving game-tree representation to ask questions from a user to learn the game's win conditions. 2. An approach that can be used to learn different win conditions patterns on the Connect Four board in addition to the four win conditions of Connect Four (column, row, diagonal, anti-diagonal). 2 A preliminary version of this paper was accepted at [5]. Learning Variants of Connect Four through HRI 3 3. An experimental analysis that quantifies the importance of different questions for learning various win conditions on the Connect Four board. 2 Related Work The field of artificial intelligence has a long history of developing systems that can play and learn games [12]. Recently, significant progress has been made developing systems capable of mastering games such as Chess, Poker and Go using deep reinforcement learning techniques [14]. While deep reinforcement learning clearly provides a method for learning how to strategically play a game, this approach requires large amounts of training data and is fundamentally non-interactive [15]. Interpersonal game learning, on the other hand, is an interactive process involving limited data and examples, and play must begin before the structure of the game is fully known in order to maintain the other person's attention and interest. Moreover, with children in particular, rules change dynamically in order to make play more favorable and exciting for the child. Data-driven retraining may not be possible or desirable in this situation. Deep learning-based meta-learning has been proposed as a means for man- aging the problem of large training time and massive data sets [15]. Although these approaches can learn how to do a task by just watching a single or few demonstrations, the new task has to be very similar to the task that the robot was originally trained on i.e. a robot trained on picking objects will not be able to learn how to place an object. Moreover, the initial meta-learning phase to train the robot on the same task still requires a large amount of data and time. Hence, the problem of using guided interaction with a human to teach the robot a new concept remains unsolved. Although, researchers have investigated using meta-learning on goal-oriented tasks such as visual navigation in novel scenes [13], to the best of our knowledge, no meta-learning approach exists for learning interactive games by watching just a single demonstration. Active learning describes the general approach of allowing a machine learner to actively seek information from a human about a particular piece of data in order to improve performance with less training [10,3,6,4]. Typically active learning is framed around a supervised learning task involving labeled and unlabeled data. There are a number of different active learning strategies, the membership query strategy being most related to our work [1]. For this active learning strategy the learner generates queries for a human focused on specific instances of data. One contribution of this paper (further discussed below) is that we leverage the robot's developing game-theoretic representation to assist with the generation of queries directed at the human. In other words, we use the game theory representation to inform the generation of our queries and to contextualize the resulting answers. 4 Ayub, A. and Wagner, A. R. Fig. 1. A demonstration of column win condition in column 5 for Connect Four seen from the robot's perspective is shown on the left. The corresponding extensive-form representation is shown on the right. The numbers along with the arrows show the action numbers chosen by the players (5 by the human and ? by the robot since robot's actions are not shown by the human in the demonstration). Best viewed in color. 3 Using Game Theory to Represent Interactive Games An interactive game in which players take alternative turns (like Connect Four) can be represented using extensive-form game format [2]. In Connect Four, players are required to place round game chips into a 7x6 vertical board. This a perfect information game because at each stage both players have complete information about the state of the game, actions taken by the other player and the actions available to the other player in the next stage.

Arxiv:2001.01004V4 [Cs.RO] 13 Jan 2021

TOYS Balls Barbie Clothes Board Books-English and Spanish Books

Communication, Fine Motor, and Gross Motor Skills for All Ages

Math Moving to Grade 5

“'In a Checkers Game.'” Point Games 2. Ka-Boom!

Math-GAMES IO1 EN.Pdf

Fourth Grade Summer Sizzlers

Let's-Remember-Together.Pptx.Pdf

Literacy Activities & Games for Parents and Children

Reinforcement Learning for Connect Four

Visualizing Game Dynamics 1 Running Head

Chapter 3.Key

FG100 Base.Qxp