On Building Generalizable Learning Agents

On Building Generalizable Learning Agents Yi Wu Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2020-9 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-9.html January 10, 2020 Copyright © 2020, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. On Building Generalizable Learning Agents by Yi Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Stuart Russell, Chair Professor Pieter Abbeel Associate Professor Javad Lavaei Assistant Professor Aviv Tamar Fall 2019 On Building Generalizable Learning Agents Copyright 2019 by Yi Wu 1 Abstract On Building Generalizable Learning Agents by Yi Wu Doctor of Philosophy in Computer Science University of California, Berkeley Professor Stuart Russell, Chair It has been a long-standing goal in Artificial Intelligence (AI) to build machines that can solve tasks that humans can. Thanks to the recent rapid progress in data-driven methods, which train agents to solve tasks by learning from massive training data, there have been many successes in applying such learning approaches to handle and even solve a number of extremely challenging tasks, including image classification, language generation, robotics control, and several complex multi-player games. The key factor for all these data-driven successes is that the trained agents can generalize to test scenarios that are unseen during training. This generalization capability is the foundation for building any practical AI system. This thesis studies generalization, the fundamental challenge in AI, and proposes solutions to improve the generalization performances of learning agents in a variety of domains. We start by providing a formal formulation of the generalization problem in the context of reinforcement learning and proposing 4 principles within this formulation to guide the design of training techniques for improved generalization. We validate the effectiveness of our proposed principles by considering 4 different domains, from simple to complex, and developing domain-specific techniques following these principles. Particularly, we begin with the simplest domain, i.e., path-finding on graphs (PartI), and then consider visual navigation in a 3D world (PartII) and competition in complex multi-agent games (Part III), and lastly tackle some natural language processing tasks (PartIV). Empirical evidences demonstrate that the proposed principles can generally lead to much improved generalization performances in a wide range of problems. i To my beloved, Shihui Li. ii Contents Contents ii List of Figures iv List of Tables vi 1 Introduction1 1.1 Generalization in Artificial Intelligence..................... 1 1.2 Problem Formulation............................... 2 1.3 Contributions................................... 5 I Decision Making on Graphs8 2 Value Iteration Network9 2.1 Motivation..................................... 9 2.2 Background.................................... 11 2.3 The Value Iteration Network Model....................... 12 2.4 Experiments.................................... 15 2.5 Additional Details................................. 19 2.6 Summary ..................................... 23 II Visual Semantic Generalization for Embodied Agents 24 3 House3D: an Environment for Building Generalizable Agents 25 3.1 Motivation..................................... 25 3.2 House3D: An Extensible Environment of 45K 3D Houses........... 29 3.3 RoomNav: A Benchmark Task for Concept-Driven Navigation . 30 3.4 Gated-Attention Networks for Multi-Target Learning............. 32 3.5 Experiments.................................... 34 3.6 Additional Details................................. 38 3.7 Summary ..................................... 41 iii 4 Bayesian Relational Memory for Semantic Visual Navigation 43 4.1 Motivation..................................... 43 4.2 Methods...................................... 47 4.3 Experiments.................................... 51 4.4 Additional Details................................. 57 4.5 Summary ..................................... 61 III Generalization in Complex Multi-Agent Games 62 5 Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Games 63 5.1 Motivation..................................... 63 5.2 Methods...................................... 68 5.3 Experiments.................................... 70 5.4 Additional Details................................. 76 5.5 Summary ..................................... 81 6 Robust Multi-Agent Reinforcement Learning via Minimax Optimization 82 6.1 Motivation..................................... 82 6.2 Preliminary .................................... 85 6.3 Methods...................................... 87 6.4 Experiments.................................... 92 6.5 Summary ..................................... 95 IV Applications in Natural Language Processing 96 7 Robust Neural Relation Extraction 97 7.1 Motivation..................................... 97 7.2 Methodology ................................... 99 7.3 Experiments.................................... 101 7.4 Summary ..................................... 104 8 Meta-Learning MCMC Proposals for Named Entity Recognition 105 8.1 Motivation..................................... 105 8.2 Meta-Learning MCMC Proposals ........................ 108 8.3 Experiments.................................... 112 8.4 Additional Details................................. 117 8.5 Summary ..................................... 121 9 Conclusion 122 Bibliography 125 iv List of Figures 2.1 The grid-world navigation task............................ 10 2.2 Planning-based NN models. ............................. 14 2.3 Two instances of gridworld with VIN-predicted trajectories and the ground-truth shortest paths. .................................... 16 2.4 Visualization of learned reward and value function by VIN............. 19 3.1 An overview of House3D environment and RoomNav task............. 27 3.2 Overview of our proposed models for RoomNav................... 33 3.3 Overall performances of various models on RoomNav................ 36 3.4 Effect of pixel-level augmentation in RoomNav. .................. 37 3.5 Effect of task-level augmentation in RoomNav.................... 37 4.1 A demonstration of the navigation task and proposed BRM method. 44 4.2 The architecture overview of BRM.......................... 46 4.3 Overview of the planning and update operator in BRM. ............. 48 4.4 Overview of prior learning in BRM.......................... 49 4.5 The learned prior of the relations........................... 50 4.6 Qualitative comparison between BRM and baselines. ............... 52 4.7 Example of a successful trajectory. ......................... 53 4.8 Performances with confidence interval of BRM and baselines............ 58 5.1 Overview of the multi-agent decentralized actor, centralized critic approach. 68 5.2 Illustration of experimental environment and tasks................. 71 5.3 Performances of MADDPG and baselines. ..................... 72 5.4 Learning curves of MADDPG and baselines on cooperative communication. 73 5.5 Visualization of learned policies by MADDPG and naive DDPG.......... 74 5.6 Effectiveness of learning by approximating policies of other agents. 75 5.7 Learning curves in covert communication. ..................... 79 6.1 Illustration of testing environments for M3DDPG.................. 92 6.2 Comparison between M3DDPG and baselines.................... 94 6.3 Performances of M3DDPG policies against best-response policies. 94 v 7.1 Proposed neural relation extraction architecture................... 99 7.2 PR curves on the NYT dataset............................ 102 7.3 PR curves on the UW dataset. ........................... 103 8.1 Overview of our proposed meta-learning MCMC paradigm. 107 8.2 Two examples instantiations of structural motif in a chain model. 110 8.3 Motif for general grid models............................. 114 8.4 Performances of different methods on the grid models with true conditionals. 114 8.5 KL divergences between the true conditionals and the outputs of our learned proposal......................................... 114 8.6 Performances of different methods on 180 grid models from UAI 2008 inference competition....................................... 114 8.7 Learning curves of different methods on various GMMs. 115 8.8 Performances of different methods on the NER task. 116 8.9 Additional results on the grid models. ....................... 118 8.10 Small version of the triangle grid model. ...................... 120 8.11 Motif for the model in Fig. 8.10............................ 120 8.12 Error w.r.t. epochs on the triangle model in Fig. 8.10. 120 vi List of Tables 2.1 Performance on grid-world domain.......................... 17 2.2 Evaluation of the effect of weight sharing. ..................... 20 2.3 Performance of VIN and baseline on the WebNav task............... 23 3.1 A summary of popular reinforcement learning environments............ 27 3.2 Statistics

On Building Generalizable Learning Agents

Improving Model-Based Reinforcement Learning with Internal State Representations Through Self-Supervision

Survey on Reinforcement Learning for Language Processing

Multi-Agent Reinforcement Learning: a Review of Challenges and Applications

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

On the Role of Planning in Model-Based Deep

The Evolving Electric Power Grid -Energy Internet, Iot and AI

Monte-Carlo Tree Search As Regularized Policy Optimization

Reinforcement Learning in Stock Market

Playing Nondeterministic Games Through Planning with a Learned

Yaklaşim Ve Uygulamalar

Muesli: Combining Improvements in Policy Optimization

The Impact of Artificial Intelligence on the Chess World