Responsible AI

Total Page:16

File Type:pdf, Size:1020Kb

Responsible AI ETHICAL PURPOSE SOCIETAL BENEFIT ACCOUNTABILITY TRANSPARENCY EXPLAINABILITY FAIRNESS NON-DISCRIMINATION SAFETY RELIABILITY OPEN DATA FAIR COMPETITION PRIVACY INTELLECTUAL PROPERTY FIRST EDITION FIRST EDITION Responsible AI ETHICAL PURPOSE SOCIETAL BENEFIT ACCOUNTABILIT Y TRANSPARENCY EXPLAINABILIT Y FAIRNESS NON-DISCRIMINATION SAFET Y RELIABILIT Y OPEN DATA FAIFIRSTR COMPE EDITIONTITION PRIVACY INTELLECTUAL PROPERT Y FIRST EDITION Charles Morgan, Editor McLean, Virginia, USA This book does not provide legal advice. It is provided for informational purposes only. In the context of this book, significant efforts have been made to provide a range of views and opinions regarding the various topics discussed herein. The views and opinions in this book do not necessarily reflect the views and opinions of the individual authors. Moreover, each of the contributors to this book has participated in its drafting on a personal basis. Accordingly the views expressed in this book do not reflect the views of any of the law firms or other entities with which they may be affiliated. Firm names and logos, while used with permission, do not necessarily imply endorsement of any of the specific views and opinions set out herein. The authors have worked diligently to ensure that all information in this book is accurate as of the time of publication. The publisher will gladly receive information that will help, in subsequent editions, to rectify any inadvertent errors or omissions. International Technology Law Association 7918 Jones Branch Drive, Suite 300 McLean, Virginia 22102, United States Phone: (+1) 703-506-2895 Fax: (+1) 703-506-3266 Email: [email protected] itechlaw.org Cover and chapter title page designs by Stan Knight, MCI USA Text design by Troy Scott Parker, Cimarron Design This book is available at www.itechlaw.org. Copyright © 2019 by International Technology Law Association. All rights reserved. ISBN-13: 978-1-7339931-0-4 Permission to reproduce or transmit in any form or by any means, electronic or mechanical, including photocopying and recording, or by an information storage and retrieval system any portion of this work must be obtained in writing from the director of book publishing at the address or fax number above. Printed in the United States of America. Printing, last digit: 10 9 8 7 6 5 4 3 2 1 For Valérie, Chloé, Anaïs and Magalie Contents Acknowledgements 7 Foreword 9 Introduction 15 1 Ethical Purpose and Societal Benefit 31 2 Accountability 71 3 Transparency and Explainability 103 4 Fairness and Non-Discrimination 133 5 Safety and Reliability 159 6 Open Data and Fair Competition 195 7 Privacy 227 8 AI and Intellectual Property 257 Responsible AI Policy Framework 289 ITechLaw 5 Acknowledgements The publication of this Global Policy Framework for Responsible AI would not have been possible without the tireless efforts of a great number of participants. We would like to thank the following people who gen- erously contributed to the publication of this book: two of our core project team members, who chose to remain anonymous; Emily Reineke, CAE, Executive Director of ITechLaw and her team; Troy Scott Parker of Cimarron Design; Gabriela Kennedy of Mayer Brown; George Takach, Kristian Brabander, Gong Ming Zheng and Linda Modica of McCarthy Tétrault (as well as the contributing members of McCarthy’s docu- ment services group); Lindsay Morgan; and Dan Hunter, Swinburne University. ITechLaw 7 Foreword The inspiration for this book came to me a year ago, almost to the day. ITechLaw members were gathering in Seattle for ITechLaw’s annual World Technology Law Conference, held May 16-18, 2018. I was preparing for the association’s board and executive committee meetings held at the offices of a leading local law firm whose offices had majestic views overlooking the Puget Sound. Quite naturally, the agenda for the board and executive committee meetings included time devoted to thinking about the strategic objectives and priorities for the association. On the bedside table of my hotel room in Seattle were two publications that had captivated my attention. The first was a whitepaper entitled“The Future Computed: Artificial Intelligence and Its Role in Society” that had been published by Microsoft a couple of months earlier and that included a Foreword written by Brad Smith (President and Chief Legal Officer, Microsoft) and Harry Shum (Executive Vice President, Artificial Intelligence and Research, Microsoft). That whitepaper made a compelling argument: in the same way that the field of privacy law had grown dramatically over the previous 20 years, it was very likely that a new field of “AI law” would develop over the next 20 years. Moreover, just as many of the core principles that form the basis of privacy and data protection legislation around the world today can be traced back to the “Fair Information Practices” that were developed in the early 1970’s, the authors of Future Computed argued that the time had come to develop a set of core “responsible AI” principles that could provide the source of inspiration for policymakers considering the future regulation of artificial intel- ligence. In this context, Future Computed set out some preliminary thoughts regarding principles, policies and laws for the responsible use of AI. The second publication on my bedside table was a book written in 2016 by the New York Times colum- nist Thomas Friedman called Thank You for Being Late: An Optimist’s Guide to Thriving in the Age of Accelerations. In that book, Friedman argues that three disruptive forces—technological change, glo- balisation and climate change—are rapidly accelerating all at once and that they are transforming five fundamental spheres of our lives: the workplace, politics, geopolitics, ethics and community. His book offered stunning examples of the phenomenal rate of change in these areas, their impacts on society, as well as some thoughts on how best to embrace and shape these forces of change to positive effect. Although I’d seemingly placed a “tab” on every second page, there was one image that I found most arresting: ITechLaw 9 RESPONSIBLE AI: A GLOBAL POLICY FRAMEWORK We are here Human adaptability Rate of change Technology Time According to Friedman, the simple graph had been drawn for him on a notepad by Eric “Astro” Teller, the CEO of Google’s X research and development lab. As recounted in Thank You for Being Late, Teller explained that a thousand years ago, the curve representing scientific and technological progress rose so gradually that it could take about one hundred years for the world to look and feel dramatically differently. By contrast, “now, in 2016, that time window—having continued to shrink as each technology stood on the shoulders of past technologies—has become so short that it’s in the order of five to seven years from the time that something is introduced to being ubiquitous and the world being uncomfortably changed.” The fundamental point that Teller was eliciting was that, although humans are remarkably adaptable, we have now reached a point that the rate of technologi- cal change has progressed beyond human’s ability to keep up. As Friedman writes, citing Teller: “if the technology platform for society can now turn over in five to seven years, but it takes ten to fifteen years to adapt to it […] we will all feel out of control, because we can’t adapt to the world as fast as it is changing….” Teller concluded: “Without clear knowledge of the future potential or future unintended negative consequences of new technologies, it is nearly impossible to draft regulations that will pro- mote important advances—while still protecting ourselves from every bad side effect.” This latter thought resonated for me in particular because a third set of ideas were swirling in my head as I prepared for the ITechLaw board meeting: the ongoing revelations about Cambridge Analytica and its potential impact on the U.S. elections. As a technology lawyer who had explored privacy law issues in depth for the past twenty years, I thought I had a clear and nuanced understanding of the many ways in which personal information was collected and used in modern society. But, like many people, the Cambridge Analytica story sent me reeling. It impressed upon me just how unaware many of us are about how technology is being used and just how important the consequences can be to some of our most fundamental values and institutions. So … with these three ideas milling about, I attended the ITechLaw board meeting. And, when it came time to propose projects and priorities for the coming year, I suggested—with a good deal of trepidation, quite frankly—that we should consider preparing, as a group, a whitepaper to estab- lish a legal/policy framework for ethical AI. My trepidation stemmed from the fact that I was already deeply invested in the idea and I feared that others would not be. My fear turned out to be entirely unfounded. Within 15 minutes of making the initial suggestion at the Seattle ITechLaw board meet- ing, about fifteen members had joined the team. 10 ITechLaw FOREWORD Working with an initial core project team over the summer, we started to identify doctrinal sources and case studies and to sketch out the key principles that we would develop as part of our framework. In these initial efforts, I was greatly assisted by some remarkable students and young associates at my law firm who would ultimately join the project team and play a significant role in research and drafting. In the early stages of the project, our focus was on building out the team, ensuring we had appropriate geographic, gender, ethnic and industry diversity, and collecting key source mate- rials that would help us buttress our policy framework. Moreover, we wanted to expand the team beyond ourselves, to include members with industry experience and serious academic credentials.
Recommended publications
  • Divulgação Bibliográfica
    Divulgação bibliográfica Julho/Agosto 2019 Biblioteca da Faculdade de Direito da Universidade de Coimbra Sumário BASES DE DADOS NA FDUC ........................................................................................ 4 E-BOOKS .................................................................................................................. 6 MONOGRAFIAS ........................................................................................................ 52 Ciências Jurídico-Empresariais................................................................................................................. 53 Ciências Jurídico-Civilísticas ..................................................................................................................... 70 Ciências Jurídico-Criminais ...................................................................................................................... 79 Ciências Jurídico-Económicas .................................................................................................................. 82 Ciências Jurídico-Filosóficas ..................................................................................................................... 83 Ciências Jurídico-Históricas ..................................................................................................................... 88 Ciências Jurídico-Políticas ........................................................................................................................ 94 Vária ......................................................................................................................................................
    [Show full text]
  • Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril
    Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril Michael Matheny, Sonoo Thadaney Israni, Mahnoor Ahmed, and Danielle Whicher, Editors WASHINGTON, DC NAM.EDU PREPUBLICATION COPY - Uncorrected Proofs NATIONAL ACADEMY OF MEDICINE • 500 Fifth Street, NW • WASHINGTON, DC 20001 NOTICE: This publication has undergone peer review according to procedures established by the National Academy of Medicine (NAM). Publication by the NAM worthy of public attention, but does not constitute endorsement of conclusions and recommendationssignifies that it is the by productthe NAM. of The a carefully views presented considered in processthis publication and is a contributionare those of individual contributors and do not represent formal consensus positions of the authors’ organizations; the NAM; or the National Academies of Sciences, Engineering, and Medicine. Library of Congress Cataloging-in-Publication Data to Come Copyright 2019 by the National Academy of Sciences. All rights reserved. Printed in the United States of America. Suggested citation: Matheny, M., S. Thadaney Israni, M. Ahmed, and D. Whicher, Editors. 2019. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. NAM Special Publication. Washington, DC: National Academy of Medicine. PREPUBLICATION COPY - Uncorrected Proofs “Knowing is not enough; we must apply. Willing is not enough; we must do.” --GOETHE PREPUBLICATION COPY - Uncorrected Proofs ABOUT THE NATIONAL ACADEMY OF MEDICINE The National Academy of Medicine is one of three Academies constituting the Nation- al Academies of Sciences, Engineering, and Medicine (the National Academies). The Na- tional Academies provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions.
    [Show full text]
  • The Future of Robotics an Inside View on Innovation in Robotics
    The Future of Robotics An Inside View on Innovation in Robotics FEATURE Robots, Humans and Work Executive Summary Robotics in the Startup Ecosystem The automation of production through three industrial revolutions has increased global output exponentially. Now, with machines increasingly aware and interconnected, Industry 4.0 is upon us. Leading the charge are fleets of autonomous robots. Built by major multinationals and increasingly by innovative VC-backed companies, these robots have already become established participants in many areas of the economy, from assembly lines to farms to restaurants. Investors, founders and policymakers are all still working to conceptualize a framework for these companies and their transformative Austin Badger technology. In this report, we take a data-driven approach to emerging topics in the industry, including business models, performance metrics, Director, Frontier Tech Practice and capitalization trends. Finally, we review leading theories of how automation affects the labor market, and provide quantitative evidence for and against them. It is our view that the social implications of this industry will be massive and will require a continual examination by those driving this technology forward. The Future of Robotics 2 Table of Contents 4 14 21 The Landscape VC and Robots Robots, Humans and Work Industry 4.0 and the An Emerging Framework Robotics Ecosystem The Interplay of Automation and Labor The Future of Robotics 3 The Landscape Industry 4.0 and the Robotics Ecosystem The Future of Robotics 4 COVID-19 and US Manufacturing, Production and Nonsupervisory Workers the Next 12.8M Automation Wave 10.2M Recessions tend to reduce 9.0M employment, and some jobs don’t come back.
    [Show full text]
  • Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning
    Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning Richard Kelly and David Churchill Department of Computer Science Memorial University of Newfoundland St. John’s, NL, Canada [email protected], [email protected] Abstract an enormous financial investment in hardware for training, using over 80000 CPU cores to run simultaneous instances Real-time Strategy (RTS) games provide a challenging en- of StarCraft II, 1200 Tensor Processor Units (TPUs) to train vironment for AI research, due to their large state and ac- the networks, as well as a large amount of infrastructure and tion spaces, hidden information, and real-time gameplay. Star- Craft II has become a new test-bed for deep reinforcement electricity to drive this large-scale computation. While Al- learning systems using the StarCraft II Learning Environment phaStar is estimated to be the strongest existing RTS AI agent (SC2LE). Recently the full game of StarCraft II has been ap- and was capable of beating many players at the Grandmas- proached with a complex multi-agent reinforcement learning ter rank on the StarCraft II ladder, it does not yet play at the (RL) system, however this is currently only possible with ex- level of the world’s best human players (e.g. in a tournament tremely large financial investments out of the reach of most setting). The creation of AlphaStar demonstrated that using researchers. In this paper we show progress on using varia- deep learning to tackle RTS AI is a powerful solution, how- tions of easier to use RL techniques, modified to accommo- ever applying it to the entire game as a whole is not an eco- date actions with multiple components used in the SC2LE.
    [Show full text]
  • Regulating Regtech: the Benefits of a Globalized Approach
    111 Regulating Regtech: The Benefits of a Globalized Approach Anastasia KONINA1 Abstract: Regulatory technology, or RegTech, helps financial institutions around the world comply with a myriad of data-driven financial regulations. RegTech’s algorithms make thousands of decisions every day: they identify and block suspicious clients and transactions, monitor third party exposures, and maintain statutory capital thresholds. This article tackles one of the main problems of RegTech: the risks posed by inherently opaque algorithms that make the aforementioned decisions. It suggests that due to the fundamental 111 REGULATING REGTECH: THE BENEFITS OF A GLOBALIZED APPROACH APPROACH GLOBALIZED REGTECH: THEBENEFITS OFA REGULATING importance of RegTech for the robustness of global finance, a globalized model of regulating RegTech’s algorithms should be adopted. This model should be based on two facets of regulation: first, the soft law harmonizing instruments implemented by transgovernmental networks of cooperation; and, second, the domestic administrative instruments that adapt global requirements to particular jurisdictions. Anastasia KONINA Anastasia KONINA INTRODUCTION The financial system’s actors operate in the dynamic reporting and compliance environment prompted by the aftermath of the global financial crisis of 2008 (“the Global Financial Crisis of 2008”). In the post- crisis years, global and domestic regulators have created a regulatory regime that necessitates the collection, analysis, and reporting of granular data in real-time or near-to-real-time.2 The Dodd-Frank Act (US),3 the Bank Recovery and Resolution Directive (EU),4 and the Basel Committee’s “Principles for effective risk data aggregation and risk reporting” require5 financial institutions to collect, calculate, and report aggregated risk data from across the banking group.
    [Show full text]
  • Basic Income with High Open Innovation Dynamics: the Way to the Entrepreneurial State
    Journal of Open Innovation: Technology, Market, and Complexity Concept Paper Basic Income with High Open Innovation Dynamics: The Way to the Entrepreneurial State Jinhyo Joseph Yun 1,* , KyungBae Park 2 , Sung Duck Hahm 3 and Dongwook Kim 4 1 Department of Open Innovation, Open Innovation Academy of SOItmC, Convergence Research Center for Future Automotive Technology of DGIST, Daegu 42988, Korea 2 Department of Business Administration, Sangji University, 83 Sangjidae-gil, Wonju, Gangwon-do 26339, Korea 3 Korean Institute for Presidential Studies (KIPS), Seoul 06306, Korea 4 Graduate School of Public Administration, Seoul National University, Seoul 08826, Korea * Correspondence: [email protected] Received: 21 May 2019; Accepted: 25 June 2019; Published: 11 July 2019 Abstract: Currently, the world economy is approaching a near-zero growth rate. Governments should move from a market-failure-oriented to a system-failure-oriented approach to understanding this problem, and transform to an entrepreneurial state to motivate the Schumpeterian dynamics of open innovation. We want to answer the following research question in this study: “How can a government enact policies to conquer the growth limits imposed on the economy by inequality or the control of big businesses?” First, we conducted a literature review to establish the concept of building a causal loop model of basic income with open innovation dynamics. Second, we built a causal loop model which includes basic income and all factors of open innovation dynamics. Third, we proved our causal loop model through a meta-analysis of global cases of basic income. Our research indicates that reflective basic income with permissionless open innovation, capital fluidity, a sharing economy, and a platform tax can motivate open innovation dynamics and arrive at a method by which an entrepreneurial state can conquer the growth limits of capitalism.
    [Show full text]
  • Automation and Inequality: the Changing World of Work in The
    Automation and inequality The changing world of work in the global South Andrew Norton Issue Paper Green economy Keywords: August 2017 Technology, social policy, gender About the author Andrew Norton is director of the International Institute for Environment and Development (IIED), [email protected] (@andynortondev) Acknowledgements This paper covers a broad territory and many conversations over many months fed into it. I am grateful to the following colleagues for ideas, comments on drafts and conversations that have influenced the paper. From outside IIED: Simon Maxwell, Alice Evans (University of Cambridge), Mark Graham (Oxford University), Becky Faith (IDS Sussex), Kathy Peach (Bond), Joy Green (Forum for the Future), Stefan Raubenheimer (South South North), Arjan de Haan (IDRC), Rebeca Grynspan (SEGIB). From within IIED: Alejandro Guarin, Tom Bigg, Clare Shakya, Sam Greene, Paul Steele, Lorenzo Cotula, Sam Barrett. Much of the content of the paper was stimulated by a panel on the future of work in developing countries at the 2017 conference of Bond (UK) that included (in addition to Joy, Kathy and Becky) Elizabeth Stuart of ODI, and a webinar organised by Bond and Forum for the Future in June 2017. Responsibility for final content and errors is of course entirely mine. Produced by IIED The International Institute for Environment and Development (IIED) promotes sustainable development, linking local priorities to global challenges. We support some of the world’s most vulnerable people to strengthen their voice in decision making. Published by IIED, August 2017 Norton, A (2017) Automation and inequality. The changing world of work in the global South. Issue Paper.
    [Show full text]
  • Towards Incremental Agent Enhancement for Evolving Games
    Evaluating Reinforcement Learning Algorithms For Evolving Military Games James Chao*, Jonathan Sato*, Crisrael Lucero, Doug S. Lange Naval Information Warfare Center Pacific *Equal Contribution ffi[email protected] Abstract games in 2013 (Mnih et al. 2013), Google DeepMind devel- oped AlphaGo (Silver et al. 2016) that defeated world cham- In this paper, we evaluate reinforcement learning algorithms pion Lee Sedol in the game of Go using supervised learning for military board games. Currently, machine learning ap- and reinforcement learning. One year later, AlphaGo Zero proaches to most games assume certain aspects of the game (Silver et al. 2017b) was able to defeat AlphaGo with no remain static. This methodology results in a lack of algorithm robustness and a drastic drop in performance upon chang- human knowledge and pure reinforcement learning. Soon ing in-game mechanics. To this end, we will evaluate general after, AlphaZero (Silver et al. 2017a) generalized AlphaGo game playing (Diego Perez-Liebana 2018) AI algorithms on Zero to be able to play more games including Chess, Shogi, evolving military games. and Go, creating a more generalized AI to apply to differ- ent problems. In 2018, OpenAI Five used five Long Short- term Memory (Hochreiter and Schmidhuber 1997) neural Introduction networks and a Proximal Policy Optimization (Schulman et al. 2017) method to defeat a professional DotA team, each AlphaZero (Silver et al. 2017a) described an approach that LSTM acting as a player in a team to collaborate and achieve trained an AI agent through self-play to achieve super- a common goal. AlphaStar used a transformer (Vaswani et human performance.
    [Show full text]
  • Alphastar: an Evolutionary Computation Perspective GECCO ’19 Companion, July 13–17, 2019, Prague, Czech Republic
    AlphaStar: An Evolutionary Computation Perspective Kai Arulkumaran Antoine Cully Julian Togelius Imperial College London Imperial College London New York University London, United Kingdom London, United Kingdom New York City, NY, United States [email protected] [email protected] [email protected] ABSTRACT beat a grandmaster at StarCraft (SC), a real-time strategy game. In January 2019, DeepMind revealed AlphaStar to the world—the Both the original game, and its sequel SC II, have several prop- first artificial intelligence (AI) system to beat a professional player erties that make it considerably more challenging than even Go: at the game of StarCraft II—representing a milestone in the progress real-time play, partial observability, no single dominant strategy, of AI. AlphaStar draws on many areas of AI research, including complex rules that make it hard to build a fast forward model, and deep learning, reinforcement learning, game theory, and evolution- a particularly large and varied action space. ary computation (EC). In this paper we analyze AlphaStar primar- DeepMind recently took a considerable step towards this grand ily through the lens of EC, presenting a new look at the system and challenge with AlphaStar, a neural-network-based AI system that relating it to many concepts in the field. We highlight some of its was able to beat a professional SC II player in December 2018 [20]. most interesting aspects—the use of Lamarckian evolution, com- This system, like its predecessor AlphaGo, was initially trained us- petitive co-evolution, and quality diversity. In doing so, we hope ing imitation learning to mimic human play, and then improved to provide a bridge between the wider EC community and one of through a combination of reinforcement learning (RL) and self- the most significant AI systems developed in recent times.
    [Show full text]
  • Long-Term Planning and Situational Awareness in Openai Five
    Long-Term Planning and Situational Awareness in OpenAI Five Jonathan Raiman∗ Susan Zhang∗ Filip Wolski Dali OpenAI OpenAI [email protected] [email protected] [email protected] Abstract Understanding how knowledge about the world is represented within model-free deep reinforcement learning methods is a major challenge given the black box nature of its learning process within high-dimensional observation and action spaces. AlphaStar and OpenAI Five have shown that agents can be trained without any explicit hierarchical macro-actions to reach superhuman skill in games that require taking thousands of actions before reaching the final goal. Assessing the agent’s plans and game understanding becomes challenging given the lack of hierarchy or explicit representations of macro-actions in these models, coupled with the incomprehensible nature of the internal representations. In this paper, we study the distributed representations learned by OpenAI Five to investigate how game knowledge is gradually obtained over the course of training. We also introduce a general technique for learning a model from the agent’s hidden states to identify the formation of plans and subgoals. We show that the agent can learn situational similarity across actions, and find evidence of planning towards accomplishing subgoals minutes before they are executed. We perform a qualitative analysis of these predictions during the games against the DotA 2 world champions OG in April 2019. 1 Introduction The choice of action and plan representation has dramatic consequences on the ability for an agent to explore, learn, or generalize when trying to accomplish a task. Inspired by how humans methodically organize and plan for long-term goals, Hierarchical Reinforcement Learning (HRL) methods were developed in an effort to augment the set of actions available to the agent to include temporally extended multi-action subroutines.
    [Show full text]
  • Download This PDF File
    Vol. 10, No. 1 (2019) http://www.eludamos.org Alive: A Case Study of the Design of an AI Conversation Simulator Eric Walsh Eludamos. Journal for Computer Game Culture. 2019; 10 (1), pp. 161–181 Alive: A Case Study of the Design of an AI Conversation Simulator ERIC WALSH On December 19, 2018, DeepMind’s AlphaStar became the first artificial intelligence (AI) to defeat a top-level StarCraft II professional player (AlphaStar Team 2019). StarCraft II (Blizzard Entertainment 2010) is a real-time strategy game where players must balance developing their base and building up an economy with producing units and using those units to attack their opponent. Like chess, matches typically take place between two players and last until one player has been defeated. AlphaStar was trained to play StarCraft II using a combination of supervised learning (i. e., humans providing replays of past games for it to study) and reinforcement learning (i. e., the AI playing games against other versions of itself to hone its skills) (Dickson 2019). The StarCraft series has long been of interest to AI developers looking to test their AI’s mettle; in 2017, Blizzard Entertainment partnered with DeepMind to release a set of tools designed to “accelerate AI research in the real-time strategy game” (Vinyals, Gaffney, and Ewalds 2017, n.p.). Games like StarCraft II are often useful for AI development due to the challenges inherent in the complexity of their decision- making process. In this case, such challenges included the need to interpret imperfect information and the need to make numerous decisions simultaneously in real time (AlphaStar Team 2019).
    [Show full text]
  • V-MPO: On-Policy Maximum a Posteriori Policy Optimization For
    Preprint V-MPO: ON-POLICY MAXIMUM A POSTERIORI POLICY OPTIMIZATION FOR DISCRETE AND CONTINUOUS CONTROL H. Francis Song,∗ Abbas Abdolmaleki,∗ Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick DeepMind, London, UK fsongf,aabdolmaleki,springenberg,aidanclark, soyer,jwrae,snoury,arahuja,liusiqi,dhruvat, heess,danbelov,riedmiller,[email protected] ABSTRACT Some of the most successful applications of deep reinforcement learning to chal- lenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algo- rithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state- value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.
    [Show full text]