Improved Robustness of Reinforcement Learning Policies Upon Conversion

Total Page:16

File Type:pdf, Size:1020Kb

Improved Robustness of Reinforcement Learning Policies Upon Conversion 000 001 002 Improved robustness of reinforcement learning policies upon conversion to 003 004 spiking neuronal network platforms applied to ATARI games 005 006 007 1 008 Anonymous Authors 009 010 son & Gerstner, 2006; Stein et al., 2005), but they can still 011 Abstract operate well even under harsh conditions that affect their 012 Various implementations of Deep Reinforcement internal state and input. Spiking Neural Networks (SNNs) 013 Learning (RL) demonstrated excellent perfor- are considered to be closer to biological neurons due to 014 mance on tasks that can be solved by trained pol- their event-based nature; they are often termed the third 015 icy, but they are not without drawbacks. Deep RL generation of neural networks (Maass, 1996). A spike is 016 suffers from high sensitivity to noisy and missing the quantification of the internal and external process of the 017 input and adversarial attacks. To mitigate these neuron and is always equal to other spikes. Therefore, the 018 deficiencies of deep RL solutions, we suggest individual neuron can serve as a small bottleneck that gives 019 involving spiking neural networks (SNNs). Pre- the ability to sustain low intermittent noise and not propa- 020 vious work has shown that standard Neural Net- gate the noise further. Moreover, spiking neurons as a group 021 works trained using supervised learning for image in a network can damp the noise even further due to their 022 classification can be converted to SNNs with neg- collective effect and their architectural connectivity (Hazan 023 ligible deterioration in performance. In this paper, & Manevitz, 2012). However, SNNs are typically harder to 024 we convert Q-Learning ReLU-Networks (ReLU- train using backpropagation due to the non-differentiable 025 N) trained using reinforcement learning into SNN. nature of the spikes (Pfeiffer & Pfeil, 2018). 026 We provide a proof of concept for the conversion 027 of ReLU-N to SNN demonstrating improved ro- Much of the recent work with SNNs has focused on im- 028 bustness to occlusion and better generalization plementing methods similar to backpropagation (Huh & 029 than the original ReLU-N. Moreover, we show Sejnowski, 2018; Wu et al., 2018) or using biologically in- 030 promising initial results with converting full-scale spired learning rules like spike-timing-dependent plasticity 031 Deep Q-networks to SNNs, paving the way for (STDP) to train the network (Bengio et al., 2015; Diehl & 032 future research. Cook, 2015; Gilra & Gerstner, 2018; Ferr et al., 2018). One 033 of the benefits of using SNNs is their potential to be more 034 energy efficient and faster than rectified linear unit networks 035 1. Introduction (ReLU-N), particularly so on dedicated neuromorphic hard- 036 ware (Mart´ı et al., 2016). 037 Recent advancements in deep reinforcement-learning (RL) Using SNNs in a RL environment seems almost natural 038 have achieved astonishing results surpassing human perfor- since many animals learn to perform certain tasks using 039 mance on various ATARI games (Mnih et al., 2015; Hasselt a variation of semi-supervised and reinforcement learning. 040 et al., 2016; Wang et al., 2016). However, deep RL is sus- Moreover, there is evidence that biological neurons also 041 ceptible to adversarial attacks similarly to deep learning learn using evaluative feedback from neurotransmitters such 042 (Huang et al., 2017). The vulnerability to adversarial at- as dopamine (Wang et al., 2018) (e.g., in the postulated 043 tacks is due to the fact that deep RL uses gradient descent dopamine reward prediction-error signal (Schultz, 2016)). 044 to train the agent. Another consequence of the gradient However, since spiking neurons are fundamentally different 045 descent algorithm is that the trained agent learns to focus from artificial neurons, it is not clear if SNNs are as capable 046 on a few sensitive areas and when these areas are occluded as ReLU-Ns in machine learning domains. This raises the 047 or perturbed, the performance of the RL agent deteriorates. questions: Do SNNs have the capability to represent the 048 Moreover, there is evidence that the policies learned by the same functions as ReLU-N? To be more specific, can SNNs 049 networks in deep RL algorithms do not generalize well and represent complex policies that can successfully play Atari 050 the performance of the agent deteriorates when it encounters games? If so, do they have any advantages in handling noisy 051 a state that it has not seen before even if it is similar to other inputs? 052 states (Witty et al., 2018). 053 Biological systems tend to be very noisy by nature (Richard- We answer these questions by demonstrating that ReLU- 054 Improve robustness of RL Policies using Spiking Neural Networks 055 networks trained using existing reinforcement learning algo- 2.2. Deep Q-Networks 056 rithms can be converted to SNN with similar performance on Reinforcement learning algorithms train a policy π to max- 057 the reinforcement learning task when playing Atari Break- imize the expected cumulative reward received over time. 058 out game. Furthermore, we show that such converted SNNs Formally, this process is modelled as a Markov decision 059 are more robust than the original ReLU-Ns. Finally, we process (MDP). Given a state-space S and an action-space 060 demonstrate that full-sized deep Q-networks (DQN) (Mnih A, the agent starts in an initial state s 2 S from a set of 061 et al., 2015) can also be converted to SNNs and maintain its 0 0 possible start states S 2 S. At each time-step t, starting 062 better than human performance, paving the way for future 0 from t = 0, the agent takes an action a to transition from 063 research in robustness and RL with SNNs. t s s s 064 t to t+1. The probability of transitioning from state to s0 a 065 state by taking action is given by the transition func- 2. Background P (s; a; s0) R(s; a) 066 tion . The reward function defines the expected reward received by the agent after taking action a 067 2.1. Arcade learning environment s 068 on state . The Arcade learning environment (ALE) (Bellemare et al., 069 A policy π is defined as the conditional distribution of ac- 2013) is a platform that enables researchers to test their 070 tions given the state π(s; a) = P r(A = ajS = s). The algorithms on over 50 Atari 2600 games. The agent sees the t t 071 Q-value or action-value of a state-action pair for a given environment through image frames of the game, interacts 072 policy, qπ(s; a), is the expected return following policy π with the environment with 18 possible actions, and receives 073 after taking the action a from state s. 074 feedback in the form of the change in the game score. The games were designed for humans and thus are free from 1 075 π X k q (s; a) = E[ γ Rt+kjSt = s; At = a; π] (1) 076 experimenter bias. The games span many different genres k=0 077 that require the agent/algorithm to generalize well over var- 078 ious tasks, difficulty levels, and timescales. ALE thus has where γ is the discount factor. The action-value function 079 become a popular test-bed for reinforcement learning. follows a Bellman equation that can be written as: 080 π π q (st; at) = rt + γ max q (st+1; at+1) (2) 081 at+1 082 083 Many widely used reinforcement learning algorithms first 084 approximate the Q-value and then select the policy that 085 maximizes the Q-value at each step to maximize returns 086 (Sutton & Barto, 2018). Deep Q-networks (DQN) (Mnih 087 et al., 2015) are one such algorithm that uses deep artificial 088 neural networks to approximate the Q-value. The neural 089 network can learn policies from only the pixels of the screen 090 and the game score and has been shown to surpass human 091 performance on many of the Atari 2600 games. 092 093 094 095 Figure 1. Screenshot of Atari 2600 Breakout game 096 097 098 Breakout: We demonstrate our results on the game of 099 Breakout. Breakout is a game similar to the popular game 100 Pong. The player controls a paddle at the bottom of the 101 screen. There are rows of colored bricks on the upper part 102 of the screen. A ball bounces in between the bricks and the Figure 2. Architecture of Deep Q-networks; following Mnih et al. 103 player controlled paddle. If the ball hits a brick, the brick (2015); ReLu nonlinear units are emphasized by red circles. 104 breaks and the score of the game is increased. However, if 105 the ball falls below the paddle, the player loses a life. The 2.3. Spiking neurons 106 game starts with five lives, and the player/agent is supposed 107 to break all the bricks before they run of lives. Figure1 SNNs may use any of the various neuron models (W. Ger- 108 shows a frame of the game. stner, 2002; Tuckwell, 1988). For our experiments, we use 109 Improve robustness of RL Policies using Spiking Neural Networks 110 four different variations of spiking neuron. We use the nota- are unable to spike or integrate input. For this paper, we 111 tion below used to describe the variance neurons: τ is the ignore the refractory period for simplicity in the conversion 112 time constant. V is the membrane potential voltage. vrest from artificial neurons. For a complete list of the parameters 113 is the resting membrane potential. vthresh is the neuron used for LIF and stochastic LIF see supplementary materials 114 threshold.
Recommended publications
  • Spiking Neural Networks: Asurvey Oscar T
    Oscar T. Skaug Spiking Neural Networks: A Survey Bachelor’s project in Computer engineering Supervisor: Ole Christian Eidheim June 2020 Bachelor’s project NTNU Engineering Faculty of Information Technology and Electrical Faculty of Information Technology Norwegian University of Science and Technology Abstract The purpose of the survey is to support teaching and research in the department of Computer Science at the university, through fundamental research in recent advances in the machine learning field. The survey assumes the reader have some basic knowledge in machine learning, however, no prior knowledge on the topic of spiking neural networks is assumed. The aim of the survey is to introduce the reader to artificial spiking neural networks by going through the following: what constitutes and distinguish an artificial spiking neural network from other artificial neural networks; how to interpret the output signals of spiking neural networks; How supervised and unsupervised learning can be achieved in spiking neural networks; Some benefits of spiking neural networks compared to other artificial neural networks, followed by challenges specifically associated with spiking neural networks. By doing so the survey attempts to answer the question of why spiking neural networks have not been widely implemented, despite being both more computationally powerful and biologically plausible than other artificial neural networks currently available. Norwegian University of Science and Technology Trondheim, June 2020 Oscar T. Skaug The bachelor thesis The title of the bachelor thesis is “Study of recent advances in artificial and biological neuron models” with the purpose of “supporting teaching and research in the department of Computer Science at the university through fundamental research in recent advances in the machine learning field”.
    [Show full text]
  • Spiking Neural Networks: Neuron Models, Plasticity, and Graph Applications
    Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2015 Spiking Neural Networks: Neuron Models, Plasticity, and Graph Applications Shaun Donachy Virginia Commonwealth University Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Theory and Algorithms Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/3984 This Thesis is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. c Shaun Donachy, July 2015 All Rights Reserved. SPIKING NEURAL NETWORKS: NEURON MODELS, PLASTICITY, AND GRAPH APPLICATIONS A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University. by SHAUN DONACHY B.S. Computer Science, Virginia Commonwealth University, May 2013 Director: Krzysztof J. Cios, Professor and Chair, Department of Computer Science Virginia Commonwealth University Richmond, Virginia July, 2015 TABLE OF CONTENTS Chapter Page Table of Contents :::::::::::::::::::::::::::::::: i List of Figures :::::::::::::::::::::::::::::::::: ii Abstract ::::::::::::::::::::::::::::::::::::: v 1 Introduction ::::::::::::::::::::::::::::::::: 1 2 Models of a Single Neuron ::::::::::::::::::::::::: 3 2.1 McCulloch-Pitts Model . 3 2.2 Hodgkin-Huxley Model . 4 2.3 Integrate and Fire Model . 6 2.4 Izhikevich Model . 8 3 Neural Coding Techniques :::::::::::::::::::::::::: 14 3.1 Input Encoding . 14 3.1.1 Grandmother Cell and Distributed Representations . 14 3.1.2 Rate Coding . 15 3.1.3 Sine Wave Encoding . 15 3.1.4 Spike Density Encoding . 15 3.1.5 Temporal Encoding . 16 3.1.6 Synaptic Propagation Delay Encoding .
    [Show full text]
  • Simplified Spiking Neural Network Architecture and STDP Learning
    Iakymchuk et al. EURASIP Journal on Image and Video Processing (2015) 2015:4 DOI 10.1186/s13640-015-0059-4 RESEARCH Open Access Simplified spiking neural network architecture and STDP learning algorithm applied to image classification Taras Iakymchuk*, Alfredo Rosado-Muñoz, Juan F Guerrero-Martínez, Manuel Bataller-Mompeán and Jose V Francés-Víllora Abstract Spiking neural networks (SNN) have gained popularity in embedded applications such as robotics and computer vision. The main advantages of SNN are the temporal plasticity, ease of use in neural interface circuits and reduced computation complexity. SNN have been successfully used for image classification. They provide a model for the mammalian visual cortex, image segmentation and pattern recognition. Different spiking neuron mathematical models exist, but their computational complexity makes them ill-suited for hardware implementation. In this paper, a novel, simplified and computationally efficient model of spike response model (SRM) neuron with spike-time dependent plasticity (STDP) learning is presented. Frequency spike coding based on receptive fields is used for data representation; images are encoded by the network and processed in a similar manner as the primary layers in visual cortex. The network output can be used as a primary feature extractor for further refined recognition or as a simple object classifier. Results show that the model can successfully learn and classify black and white images with added noise or partially obscured samples with up to ×20 computing speed-up at an equivalent classification ratio when compared to classic SRM neuron membrane models. The proposed solution combines spike encoding, network topology, neuron membrane model and STDP learning.
    [Show full text]
  • Spiking Neural Network Vs Multilayer Perceptron: Who Is the Winner in the Racing Car Computer Game
    Soft Comput (2015) 19:3465–3478 DOI 10.1007/s00500-014-1515-2 FOCUS Spiking neural network vs multilayer perceptron: who is the winner in the racing car computer game Urszula Markowska-Kaczmar · Mateusz Koldowski Published online: 3 December 2014 © The Author(s) 2014. This article is published with open access at Springerlink.com Abstract The paper presents two neural based controllers controller on-line. The RBF network is used to establish a for the computer car racing game. The controllers represent nonlinear prediction model. In the recent paper (Zheng et al. two generations of neural networks—a multilayer perceptron 2013) the impact of heavy vehicles on lane-changing deci- and a spiking neural network. They are trained by an evolu- sions of following car drivers has been investigated, and the tionary algorithm. Efficiency of both approaches is experi- lane-changing model based on two neural network models is mentally tested and statistically analyzed. proposed. Computer games can be perceived as a virtual representa- Keywords Computer game controller · Spiking neural tion of real world problems, and they provide rich test beds network · Multilayer perceptron · Evolutionary algorithm for many artificial intelligence methods. An example which can be mentioned here is successful application of multi- layer perceptron in racing car computer games, described for 1 Introduction instance in Ebner and Tiede (2009) and Togelius and Lucas (2005). Inspiration for our research were the results shown in Neural networks (NN) are widely applied in many areas Yee and Teo (2013) presenting spiking neural network based because of their ability to learn. controller for racing car in computer game.
    [Show full text]
  • Deepfakes & Disinformation
    DEEPFAKES & DISINFORMATION DEEPFAKES & DISINFORMATION Agnieszka M. Walorska ANALYSISANALYSE 2 DEEPFAKES & DISINFORMATION IMPRINT Publisher Friedrich Naumann Foundation for Freedom Karl-Marx-Straße 2 14482 Potsdam Germany /freiheit.org /FriedrichNaumannStiftungFreiheit /FNFreiheit Author Agnieszka M. Walorska Editors International Department Global Themes Unit Friedrich Naumann Foundation for Freedom Concept and layout TroNa GmbH Contact Phone: +49 (0)30 2201 2634 Fax: +49 (0)30 6908 8102 Email: [email protected] As of May 2020 Photo Credits Photomontages © Unsplash.de, © freepik.de, P. 30 © AdobeStock Screenshots P. 16 © https://youtu.be/mSaIrz8lM1U P. 18 © deepnude.to / Agnieszka M. Walorska P. 19 © thispersondoesnotexist.com P. 19 © linkedin.com P. 19 © talktotransformer.com P. 25 © gltr.io P. 26 © twitter.com All other photos © Friedrich Naumann Foundation for Freedom (Germany) P. 31 © Agnieszka M. Walorska Notes on using this publication This publication is an information service of the Friedrich Naumann Foundation for Freedom. The publication is available free of charge and not for sale. It may not be used by parties or election workers during the purpose of election campaigning (Bundestags-, regional and local elections and elections to the European Parliament). Licence Creative Commons (CC BY-NC-ND 4.0) https://creativecommons.org/licenses/by-nc-nd/4.0 DEEPFAKES & DISINFORMATION DEEPFAKES & DISINFORMATION 3 4 DEEPFAKES & DISINFORMATION CONTENTS Table of contents EXECUTIVE SUMMARY 6 GLOSSARY 8 1.0 STATE OF DEVELOPMENT ARTIFICIAL
    [Show full text]
  • Game Playing with Deep Q-Learning Using Openai Gym
    Game Playing with Deep Q-Learning using OpenAI Gym Robert Chuchro Deepak Gupta [email protected] [email protected] Abstract sociated with performing a particular action in a given state. This information is fundamental to any reinforcement learn- Historically, designing game players requires domain- ing problem. specific knowledge of the particular game to be integrated The input to our model will be a sequence of pixel im- into the model for the game playing program. This leads ages as arrays (Width x Height x 3) generated by a particular to a program that can only learn to play a single particu- OpenAI Gym environment. We then use a Deep Q-Network lar game successfully. In this project, we explore the use of to output a action from the action space of the game. The general AI techniques, namely reinforcement learning and output that the model will learn is an action from the envi- neural networks, in order to architect a model which can ronments action space in order to maximize future reward be trained on more than one game. Our goal is to progress from a given state. In this paper, we explore using a neural from a simple convolutional neural network with a couple network with multiple convolutional layers as our model. fully connected layers, to experimenting with more complex additions, such as deeper layers or recurrent neural net- 2. Related Work works. The best known success story of classical reinforcement learning is TD-gammon, a backgammon playing program 1. Introduction which learned entirely by reinforcement learning [6]. TD- gammon used a model-free reinforcement learning algo- Game playing has recently emerged as a popular play- rithm similar to Q-learning.
    [Show full text]
  • Reinforcement Learning with Tensorflow&Openai
    Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]> http://angelpawstherapy.org/positive-reinforcement-dog-training.html Nature of Learning • We learn from past experiences. - When an infant plays, waves its arms, or looks about, it has no explicit teacher - But it does have direct interaction to its environment. • Years of positive compliments as well as negative criticism have all helped shape who we are today. • Reinforcement learning: computational approach to learning from interaction. Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction Nishant Shukla , Machine Learning with TensorFlow Reinforcement Learning https://www.cs.utexas.edu/~eladlieb/RLRG.html Machine Learning, Tom Mitchell, 1997 Atari Breakout Game (2013, 2015) Atari Games Nature : Human-level control through deep reinforcement learning Human-level control through deep reinforcement learning, Nature http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Figure courtesy of Mnih et al. "Human-level control through deep reinforcement learning”, Nature 26 Feb. 2015 https://deepmind.com/blog/deep-reinforcement-learning/ https://deepmind.com/applied/deepmind-for-google/ Reinforcement Learning Applications • Robotics: torque at joints • Business operations - Inventory management: how much to purchase of inventory, spare parts - Resource allocation: e.g. in call center, who to service first • Finance: Investment decisions, portfolio design • E-commerce/media - What content to present to users (using click-through / visit time as reward) - What ads to present to users (avoiding ad fatigue) Audience • Want to understand basic reinforcement learning (RL) • No/weak math/computer science background - Q = r + Q • Want to use RL as black-box with basic understanding • Want to use TensorFlow and Python (optional labs) Schedule 1.
    [Show full text]
  • Latest Snapshot.” to Modify an Algo So It Does Produce Multiple Snapshots, find the Following Line (Which Is Present in All of the Algorithms)
    Spinning Up Documentation Release Joshua Achiam Feb 07, 2020 User Documentation 1 Introduction 3 1.1 What This Is...............................................3 1.2 Why We Built This............................................4 1.3 How This Serves Our Mission......................................4 1.4 Code Design Philosophy.........................................5 1.5 Long-Term Support and Support History................................5 2 Installation 7 2.1 Installing Python.............................................8 2.2 Installing OpenMPI...........................................8 2.3 Installing Spinning Up..........................................8 2.4 Check Your Install............................................9 2.5 Installing MuJoCo (Optional)......................................9 3 Algorithms 11 3.1 What’s Included............................................. 11 3.2 Why These Algorithms?......................................... 12 3.3 Code Format............................................... 12 4 Running Experiments 15 4.1 Launching from the Command Line................................... 16 4.2 Launching from Scripts......................................... 20 5 Experiment Outputs 23 5.1 Algorithm Outputs............................................ 24 5.2 Save Directory Location......................................... 26 5.3 Loading and Running Trained Policies................................. 26 6 Plotting Results 29 7 Part 1: Key Concepts in RL 31 7.1 What Can RL Do?............................................ 31 7.2
    [Show full text]
  • ELF Opengo: an Analysis and Open Reimplementation of Alphazero
    ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero Yuandong Tian 1 Jerry Ma * 1 Qucheng Gong * 1 Shubho Sengupta * 1 Zhuoyuan Chen 1 James Pinkerton 1 C. Lawrence Zitnick 1 Abstract However, these advances in playing ability come at signifi- The AlphaGo, AlphaGo Zero, and AlphaZero cant computational expense. A single training run requires series of algorithms are remarkable demonstra- millions of selfplay games and days of training on thousands tions of deep reinforcement learning’s capabili- of TPUs, which is an unattainable level of compute for the ties, achieving superhuman performance in the majority of the research community. When combined with complex game of Go with progressively increas- the unavailability of code and models, the result is that the ing autonomy. However, many obstacles remain approach is very difficult, if not impossible, to reproduce, in the understanding of and usability of these study, improve upon, and extend. promising approaches by the research commu- In this paper, we propose ELF OpenGo, an open-source nity. Toward elucidating unresolved mysteries reimplementation of the AlphaZero (Silver et al., 2018) and facilitating future research, we propose ELF algorithm for the game of Go. We then apply ELF OpenGo OpenGo, an open-source reimplementation of the toward the following three additional contributions. AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate First, we train a superhuman model for ELF OpenGo. Af- superhuman performance with a perfect (20:0) ter running our AlphaZero-style training software on 2,000 record against global top professionals. We ap- GPUs for 9 days, our 20-block model has achieved super- ply ELF OpenGo to conduct extensive ablation human performance that is arguably comparable to the 20- studies, and to identify and analyze numerous in- block models described in Silver et al.(2017) and Silver teresting phenomena in both the model training et al.(2018).
    [Show full text]
  • Applying Deep Double Q-Learning and Monte Carlo Tree Search to Playing Go
    CS221 FINAL PAPER 1 Applying Deep Double Q-Learning and Monte Carlo Tree Search to Playing Go Booher, Jonathan [email protected] De Alba, Enrique [email protected] Kannan, Nithin [email protected] I. INTRODUCTION the current position. By sampling states from the self play OR our project we replicate many of the methods games along with their respective rewards, the researchers F used in AlphaGo Zero to make an optimal Go player; were able to train a binary classifier to predict the outcome of a however, we modified the learning paradigm to a version of game with a certain confidence. Then based on the confidence Deep Q-Learning which we believe would result in better measures, the optimal move was taken. generalization of the network to novel positions. We depart from this method of training and use Deep The modification of Deep Q-Learning that we use is Double Q-Learning instead. We use the same concept of called Deep Double Q-Learning and will be described later. sampling states and their rewards from the games of self play, The evaluation metric for the success of our agent is the but instead of feeding this information to a binary classifier, percentage of games that are won against our Oracle, a we feed the information to a modified Q-Learning formula, Go-playing bot available in the OpenAI Gym. Since we are which we present shortly. implementing a version of reinforcement learning, there is no data that we will need other than the simulator. By training III. CHALLENGES on the games that are generated from self-play, our player The main challenge we faced was the computational com- will output a policy that is learned at the end of training by plexity of the game of Go.
    [Show full text]
  • AI in Focus - Fundamental Artificial Intelligence and Video Games
    AI in Focus - Fundamental Artificial Intelligence and Video Games April 5, 2019 By Isi Caulder and Lawrence Yu Patent filings for fundamental artificial intelligence (AI) technologies continue to rise. Led by a number of high profile technology companies, including IBM, Google, Amazon, Microsoft, Samsung, and AT&T, patent applications directed to fundamental AI technologies, such as machine learning, neural networks, natural language processing, speech processing, expert systems, robotic and machine vision, are being filed and issued in ever-increasing numbers.[1] In turn, these fundamental AI technologies are being applied to address problems in industries such as healthcare, manufacturing, and transportation. A somewhat unexpected source of fundamental AI technology development has been occurring in the field of video games. Traditional board games have long been a subject of study for AI research. In the 1990’s, IBM created an AI for playing chess, Deep Blue, which was able to defeat top-caliber human players using brute force algorithms.[2] More recently, machine learning algorithms have been developed for more complex board games, which include a larger breadth of possible moves. For example, DeepMind (since acquired by Google), recently developed the first AI capable of defeating professional Go players, AlphaGo.[3] Video games have recently garnered the interest of researchers, due to their closer similarity to the “messiness” and “continuousness” of the real world. In contrast to board games, video games typically include a greater
    [Show full text]
  • Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
    Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims∗ Miles Brundage1†, Shahar Avin3,2†, Jasmine Wang4,29†‡, Haydn Belfield3,2†, Gretchen Krueger1†, Gillian Hadfield1,5,30, Heidy Khlaaf6, Jingying Yang7, Helen Toner8, Ruth Fong9, Tegan Maharaj4,28, Pang Wei Koh10, Sara Hooker11, Jade Leung12, Andrew Trask9, Emma Bluemke9, Jonathan Lebensold4,29, Cullen O’Keefe1, Mark Koren13, Théo Ryffel14, JB Rubinovitz15, Tamay Besiroglu16, Federica Carugati17, Jack Clark1, Peter Eckersley7, Sarah de Haas18, Maritza Johnson18, Ben Laurie18, Alex Ingerman18, Igor Krawczuk19, Amanda Askell1, Rosario Cammarota20, Andrew Lohn21, David Krueger4,27, Charlotte Stix22, Peter Henderson10, Logan Graham9, Carina Prunkl12, Bianca Martin1, Elizabeth Seger16, Noa Zilberman9, Seán Ó hÉigeartaigh2,3, Frens Kroeger23, Girish Sastry1, Rebecca Kagan8, Adrian Weller16,24, Brian Tse12,7, Elizabeth Barnes1, Allan Dafoe12,9, Paul Scharre25, Ariel Herbert-Voss1, Martijn Rasser25, Shagun Sodhani4,27, Carrick Flynn8, Thomas Krendl Gilbert26, Lisa Dyer7, Saif Khan8, Yoshua Bengio4,27, Markus Anderljung12 1OpenAI, 2Leverhulme Centre for the Future of Intelligence, 3Centre for the Study of Existential Risk, 4Mila, 5University of Toronto, 6Adelard, 7Partnership on AI, 8Center for Security and Emerging Technology, 9University of Oxford, 10Stanford University, 11Google Brain, 12Future of Humanity Institute, 13Stanford Centre for AI Safety, 14École Normale Supérieure (Paris), 15Remedy.AI, 16University of Cambridge, 17Center for Advanced Study in the Behavioral Sciences,18Google Research, 19École Polytechnique Fédérale de Lausanne, 20Intel, 21RAND Corporation, 22Eindhoven University of Technology, 23Coventry University, 24Alan Turing Institute, 25Center for a New American Security, 26University of California, Berkeley, 27University of Montreal, 28Montreal Polytechnic, 29McGill University, 30Schwartz Reisman Institute for Technology and Society arXiv:2004.07213v2 [cs.CY] 20 Apr 2020 April 2020 ∗Listed authors are those who contributed substantive ideas and/or work to this report.
    [Show full text]