Université De Montréal No Press Diplomacy Philip Paquette

Université de Montréal Faculté des études supérieures et postdoctorales Ce mémoire intitulé No Press Diplomacy présenté par Philip Paquette a été évalué par un jury composé des personnes suivantes : Liam Paull (président-rapporteur) Aaron Courville (directeur de recherche) Pierre Poulin (membre du jury) Mémoire accepté le : 31 août 2019 Sommaire Ce mémoire présente un article sur un agent pouvant jouer à la version "No-Press" (sans messages) du jeu de société Diplomacy. Diplomacy est un jeu de négociation à 7 joueurs où chacun des joueurs essaie de conquérir la majorité des centres d’approvisionnement d’Europe au début du 20e siècle. L’article présente, en premier lieu, un ensemble de données contenant plus de 150 000 jeux joués par des humains. Cet ensemble de données a été compilé suite à la signature d’un partenariat avec un site externe. Les jeux, qui ont été joués sur cette plateforme, ont tous été convertis dans un nouveau format standardizé et ont ensuite été rejoués pour s’assurer de leur qualité. L’article présente aussi un engin de jeu, avec une interface web, permettant à des humains de jouer contre les modèles qui ont été développés. De plus, l’article présente un modèle d’apprentissage supervisé où l’agent apprend à reproduire le comportement de tous les joueurs dans l’ensemble de données par maximum de vraisemblance. Un agent qui apprend à jouer par renforcement (en jouant contre lui-même) a aussi été entraîné. L’article se conclut en faisant une analyse de ces modèles et en comparant la performance des agents contre des agents utilisant des règles complexes. Mot-clés: Diplomacy, négociation, jeu, apprentissage supervisé, apprentissage par renforcement, apprentissage profond v Summary This thesis presents an article on an agent which can play the "No-Press" version (without messages) of the Diplomacy board game. Diplomacy is a 7-player negotiation game where each player tries to conquer the majority of the supply centers in Europe at the beginning of the 20th century. The article first presents a novel dataset of more than 150 000 human games. This dataset was compiled following the signing of a partnership with an external site. The games, which were played on this platform, were all converted into a new standardized format and then replayed to ensure their quality. The article also presents a game engine, with a web interface, allowing humans to play against the models that have been trained. Moreover, the article presents a supervised learning model where an agent learns to reproduce the behavior of all players in the dataset by maximum likelihood. An agent that learns by reinforcement (by playing games against itself) has also been trained. The article concludes by doing an analysis of these models and comparing their performance against complex rule-based agents. Keywords: Diplomacy, negotiation, game, supervised learning, reinforcement learning, deep learning vii Table of Contents Sommaire................................................................. v Summary................................................................. vii List of tables.............................................................. xi List of figures............................................................. xiii Chapter 1. Introduction.................................................. 1 Chapter 2. Machine Learning Basics..................................... 5 2.1. Supervised Learning......................................................... 5 2.1.1. Overfitting and Underfitting.............................................. 6 2.1.2. Maximum Likelihood Estimation.......................................... 7 2.1.3. Deep Feedforward Networks............................................... 8 2.1.4. Stochastic Gradient Descent.............................................. 9 2.1.5. Convolutional Neural Networks........................................... 9 2.1.6. Recurrent Neural Networks............................................... 10 2.1.7. Attention Mechanisms.................................................... 11 2.2. Reinforcement Learning...................................................... 11 2.2.1. Markov Decision Process.................................................. 12 2.2.2. Value Function and Q-Values............................................. 13 2.2.3. Solving the RL Problem - Policy Iteration................................. 13 2.2.4. Bootstrapping............................................................ 13 2.2.5. Solving the RL Problem - Policy Gradient................................. 13 Chapter 3. The Diplomacy Game........................................ 15 ix 3.1. Game Overview.............................................................. 15 3.2. Communication in a No Press Game......................................... 18 Main article. No Press Diplomacy: Modeling Multi-Agent Gameplay...... 21 1. Introduction................................................................... 23 2. No Press Diplomacy: Game Overview.......................................... 26 3. Previous Work................................................................. 28 4. DipNet: A Generative Model of Unit Orders................................... 29 4.1. Input Representation....................................................... 29 4.2. Graph Convolution Network with FiLM..................................... 30 4.3. Decoder.................................................................... 31 5. Datasets and Game Engine.................................................... 32 6. Experiments................................................................... 33 6.1. Supervised Learning........................................................ 33 6.2. Reinforcement Learning and Self-play....................................... 35 6.3. Coalition Analysis.......................................................... 36 7. Conclusion..................................................................... 37 Chapter 4. Conclusion................................................... 39 Bibliography.............................................................. 41 x List of tables 4.1 Dataset statistics............................................................... 33 4.2 Evaluation of supervised models: Predicting human orders....................... 34 4.3 Comparison of the models’ ability to predict support orders with greedy decoding. 34 4.4 Diplomacy agents comparison when played against each other, with one agent controlling one power and the other six powers controlled by copies of the other agent........................................................................... 35 4.5 Coalition formation: Diplomacy agents comparison.............................. 37 xi List of figures 2.1 Underfitting vs Overfitting...................................................... 6 2.2 Deep Feedforward Network / Neural Network................................... 8 2.3 Convolutional Neural Network.................................................. 9 2.4 Recurrent Neural Network...................................................... 10 2.5 Attention Mechanism........................................................... 11 2.6 Agent-Environment Interaction................................................. 12 4.1 The Standard Diplomacy Map.................................................. 25 4.2 Encoding of the Board State and Previous Orders............................... 30 4.3 DipNet Architecture............................................................ 32 xiii Chapter 1 Introduction Diplomacy is a negotiation board game that was developed in 1959 by Allan B. Calhamer. Seven powers, namely Austria, England, France, Germany, Italy, Russia, and Turkey, try to conquer a majority of the supply centers of Europe at the beginning of the 20th century. Diplomacy is an interesting domain for artificial intelligence for several reasons: • Large action space. There can be up to 34 units on the board (one for each supply center), with each unit having an average of 26 possible actions, giving an average branching factor of 1:29 ∗ 1048. Compared to Go (branching factor of 200 [5]), or chess (branching factor of 35 [5]), Diplomacy is several orders of magnitude more complex, making techniques such as search and rollout much more difficult to implement. • No randomness. As opposed to games like Risk [21] (which uses dice) or Hanabi [3] (which uses shuffled cards), Diplomacy has no elements of randomness and is purely a skills game. • Negotiation. Players want to act in their best interests, yet to win the game, they must also negotiate and cooperate with other players. Negotiation can be done either explicitly through messages, or implicitly through the orders that players submit simultaneously. Alliances between players are usually short-lived and betrayal and deception is common. 1 • Language emergence and grounding. Diplomacy can be an interesting testbed for language emergence. For instance, are agents able to create a new language among themselves to play better than agents that are not able to communicate? Moreover, the conversations need to be grounded in the game context, otherwise the players will not be able to understand each other. • Imperfect and incomplete information. Orders are submitted by all players simultaneously, therefore players are not aware of what other players will do when deciding what orders to play. Moreover, messages exchanged between players are private, and can only be seen by its sender and recipient. Finally, the players’ strategy are not fully known and may

Université De Montréal No Press Diplomacy Philip Paquette

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support