Latest Snapshot.” to Modify an Algo So It Does Produce Multiple Snapshots, find the Following Line (Which Is Present in All of the Algorithms)

Total Page:16

File Type:pdf, Size:1020Kb

Latest Snapshot.” to Modify an Algo So It Does Produce Multiple Snapshots, find the Following Line (Which Is Present in All of the Algorithms) Spinning Up Documentation Release Joshua Achiam Feb 07, 2020 User Documentation 1 Introduction 3 1.1 What This Is...............................................3 1.2 Why We Built This............................................4 1.3 How This Serves Our Mission......................................4 1.4 Code Design Philosophy.........................................5 1.5 Long-Term Support and Support History................................5 2 Installation 7 2.1 Installing Python.............................................8 2.2 Installing OpenMPI...........................................8 2.3 Installing Spinning Up..........................................8 2.4 Check Your Install............................................9 2.5 Installing MuJoCo (Optional)......................................9 3 Algorithms 11 3.1 What’s Included............................................. 11 3.2 Why These Algorithms?......................................... 12 3.3 Code Format............................................... 12 4 Running Experiments 15 4.1 Launching from the Command Line................................... 16 4.2 Launching from Scripts......................................... 20 5 Experiment Outputs 23 5.1 Algorithm Outputs............................................ 24 5.2 Save Directory Location......................................... 26 5.3 Loading and Running Trained Policies................................. 26 6 Plotting Results 29 7 Part 1: Key Concepts in RL 31 7.1 What Can RL Do?............................................ 31 7.2 Key Concepts and Terminology..................................... 32 7.3 (Optional) Formalism.......................................... 39 8 Part 2: Kinds of RL Algorithms 41 8.1 A Taxonomy of RL Algorithms..................................... 41 i 8.2 Links to Algorithms in Taxonomy.................................... 44 9 Part 3: Intro to Policy Optimization 45 9.1 Deriving the Simplest Policy Gradient.................................. 46 9.2 Implementing the Simplest Policy Gradient............................... 47 9.3 Expected Grad-Log-Prob Lemma.................................... 50 9.4 Don’t Let the Past Distract You..................................... 51 9.5 Implementing Reward-to-Go Policy Gradient.............................. 52 9.6 Baselines in Policy Gradients...................................... 52 9.7 Other Forms of the Policy Gradient................................... 53 9.8 Recap................................................... 54 10 Spinning Up as a Deep RL Researcher 55 10.1 The Right Background.......................................... 55 10.2 Learn by Doing.............................................. 56 10.3 Developing a Research Project...................................... 57 10.4 Doing Rigorous Research in RL..................................... 58 10.5 Closing Thoughts............................................. 59 10.6 PS: Other Resources........................................... 59 10.7 References................................................ 59 11 Key Papers in Deep RL 61 11.1 1. Model-Free RL............................................ 63 11.2 2. Exploration.............................................. 63 11.3 3. Transfer and Multitask RL...................................... 63 11.4 4. Hierarchy............................................... 63 11.5 5. Memory................................................ 63 11.6 6. Model-Based RL........................................... 63 11.7 7. Meta-RL................................................ 63 11.8 8. Scaling RL............................................... 63 11.9 9. RL in the Real World......................................... 63 11.10 10. Safety................................................. 63 11.11 11. Imitation Learning and Inverse Reinforcement Learning...................... 63 11.12 12. Reproducibility, Analysis, and Critique............................... 63 11.13 13. Bonus: Classic Papers in RL Theory or Review........................... 63 12 Exercises 65 12.1 Problem Set 1: Basics of Implementation................................ 65 12.2 Problem Set 2: Algorithm Failure Modes................................ 67 12.3 Challenges................................................ 68 13 Benchmarks for Spinning Up Implementations 69 13.1 Performance in Each Environment.................................... 70 13.2 Experiment Details............................................ 70 13.3 PyTorch vs Tensorflow.......................................... 71 14 Vanilla Policy Gradient 73 14.1 Background................................................ 73 14.2 Documentation.............................................. 75 14.3 References................................................ 79 15 Trust Region Policy Optimization 81 15.1 Background................................................ 81 15.2 Documentation.............................................. 84 15.3 References................................................ 88 ii 16 Proximal Policy Optimization 89 16.1 Background................................................ 89 16.2 Documentation.............................................. 92 16.3 References................................................ 96 17 Deep Deterministic Policy Gradient 99 17.1 Background................................................ 100 17.2 Documentation.............................................. 103 17.3 References................................................ 107 18 Twin Delayed DDPG 109 18.1 Background................................................ 109 18.2 Documentation.............................................. 112 18.3 References................................................ 117 19 Soft Actor-Critic 119 19.1 Background................................................ 120 19.2 Documentation.............................................. 124 19.3 References................................................ 130 20 Logger 131 20.1 Using a Logger.............................................. 131 20.2 Logger Classes.............................................. 134 20.3 Loading Saved Models (PyTorch Only)................................. 136 20.4 Loading Saved Graphs (Tensorflow Only)................................ 136 21 Plotter 139 22 MPI Tools 141 22.1 Core MPI Utilities............................................ 141 22.2 MPI + PyTorch Utilities......................................... 142 22.3 MPI + Tensorflow Utilities........................................ 142 23 Run Utils 145 23.1 ExperimentGrid............................................. 145 23.2 Calling Experiments........................................... 146 24 Acknowledgements 149 25 About the Author 151 26 Indices and tables 153 Python Module Index 155 iii iv Spinning Up Documentation, Release User Documentation 1 Spinning Up Documentation, Release 2 User Documentation CHAPTER 1 Introduction Table of Contents • Introduction – What This Is – Why We Built This – How This Serves Our Mission – Code Design Philosophy – Long-Term Support and Support History 1.1 What This Is Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL). For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning. This module contains a variety of helpful resources, including: • a short introduction to RL terminology, kinds of algorithms, and basic theory, • an essay about how to grow into an RL research role, •a curated list of important papers organized by topic, • a well-documented code repo of short, standalone implementations of key algorithms, • and a few exercises to serve as warm-ups. 3 Spinning Up Documentation, Release 1.2 Why We Built This One of the single most common questions that we hear is If I want to contribute to AI safety, how do I get started? At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. To ensure that AI is safe, we have to come up with safety strategies and algorithms that are compatible with this paradigm. As a result, we encourage everyone who asks this question to study these fields. However, while there are many resources to help people quickly ramp up on deep learning, deep reinforcement learning is more challenging to break into. To begin with, a student of deep RL needs to have some background in math, coding, and regular deep learning. Beyond that, they need both a high-level view of the field—an awareness of what topics are studied in it, why they matter, and what’s been done already—and careful instruction on how to connect algorithm theory to algorithm code. The high-level view is hard to come by because of how new the field is. There is not yet a standard deep RL textbook, so most of the knowledge is locked up in either papers or lecture series, which can take a long time to parse and digest. And learning to implement deep RL algorithms is typically painful, because either • the paper that publishes an algorithm omits or inadvertently obscures key design details, • or widely-public implementations of an algorithm are hard to read, hiding how the code lines up with the algorithm. While fantastic repos like garage, Baselines, and rllib make it easier for researchers who are already in the field to make progress, they build algorithms into frameworks
Recommended publications
  • Intro to Tensorflow 2.0 MBL, August 2019
    Intro to TensorFlow 2.0 MBL, August 2019 Josh Gordon (@random_forests) 1 Agenda 1 of 2 Exercises ● Fashion MNIST with dense layers ● CIFAR-10 with convolutional layers Concepts (as many as we can intro in this short time) ● Gradient descent, dense layers, loss, softmax, convolution Games ● QuickDraw Agenda 2 of 2 Walkthroughs and new tutorials ● Deep Dream and Style Transfer ● Time series forecasting Games ● Sketch RNN Learning more ● Book recommendations Deep Learning is representation learning Image link Image link Latest tutorials and guides tensorflow.org/beta News and updates medium.com/tensorflow twitter.com/tensorflow Demo PoseNet and BodyPix bit.ly/pose-net bit.ly/body-pix TensorFlow for JavaScript, Swift, Android, and iOS tensorflow.org/js tensorflow.org/swift tensorflow.org/lite Minimal MNIST in TF 2.0 A linear model, neural network, and deep neural network - then a short exercise. bit.ly/mnist-seq ... ... ... Softmax model = Sequential() model.add(Dense(256, activation='relu',input_shape=(784,))) model.add(Dense(128, activation='relu')) model.add(Dense(10, activation='softmax')) Linear model Neural network Deep neural network ... ... Softmax activation After training, select all the weights connected to this output. model.layers[0].get_weights() # Your code here # Select the weights for a single output # ... img = weights.reshape(28,28) plt.imshow(img, cmap = plt.get_cmap('seismic')) ... ... Softmax activation After training, select all the weights connected to this output. Exercise 1 (option #1) Exercise: bit.ly/mnist-seq Reference: tensorflow.org/beta/tutorials/keras/basic_classification TODO: Add a validation set. Add code to plot loss vs epochs (next slide). Exercise 1 (option #2) bit.ly/ijcav_adv Answers: next slide.
    [Show full text]
  • Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C
    Keras2c: A library for converting Keras neural networks to real-time compatible C Rory Conlina,∗, Keith Ericksonb, Joeseph Abbatec, Egemen Kolemena,b,∗ aDepartment of Mechanical and Aerospace Engineering, Princeton University, Princeton NJ 08544, USA bPrinceton Plasma Physics Laboratory, Princeton NJ 08544, USA cDepartment of Astrophysical Sciences at Princeton University, Princeton NJ 08544, USA Abstract With the growth of machine learning models and neural networks in mea- surement and control systems comes the need to deploy these models in a way that is compatible with existing systems. Existing options for deploying neural networks either introduce very high latency, require expensive and time con- suming work to integrate into existing code bases, or only support a very lim- ited subset of model types. We have therefore developed a new method called Keras2c, which is a simple library for converting Keras/TensorFlow neural net- work models into real-time compatible C code. It supports a wide range of Keras layers and model types including multidimensional convolutions, recurrent lay- ers, multi-input/output models, and shared layers. Keras2c re-implements the core components of Keras/TensorFlow required for predictive forward passes through neural networks in pure C, relying only on standard library functions considered safe for real-time use. The core functionality consists of ∼ 1500 lines of code, making it lightweight and easy to integrate into existing codebases. Keras2c has been successfully tested in experiments and is currently in use on the plasma control system at the DIII-D National Fusion Facility at General Atomics in San Diego. 1. Motivation TensorFlow[1] is one of the most popular libraries for developing and training neural networks.
    [Show full text]
  • Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
    TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going.
    [Show full text]
  • Setting up Your Environment for the TF Developer Certificate Exam
    Set up your environment to take the TensorFlow Developer Ceicate Exam Questions? Email [email protected]. Last Updated: June 23, 2021 This document describes how to get your environment ready to take the TensorFlow Developer Ceicate exam. It does not discuss what the exam is or how to take it. For more information about the TensorFlow Developer Ceicate program, please go to tensolow.org/ceicate. Contents Before you begin Refund policy Install Python 3.8 Install PyCharm Get your environment ready for the exam What libraries will the exam infrastructure install? Make sure that PyCharm isn't subject to le-loading controls Windows and Linux Users: Check your GPU drivers Mac Users: Ensure you have Python 3.8 All Users: Create a Test Viual Environment that uses TensorFlow in PyCharm Create a new PyCharm project Install TensorFlow and related packages Check the suppoed version of Python in PyCharm Practice training TensorFlow models Setup your environment for the TensorFlow Developer Certificate exam 1 FAQs How do I sta the exam? What version of TensorFlow is used in the exam? Is there a minimum hardware requirement? Where is the candidate handbook? Before you begin The TensorFlow ceicate exam runs inside PyCharm. The exam uses TensorFlow 2.5.0. You must use Python 3.8 to ensure accurate grading of your models. The exam has been tested with Python 3.8.0 and TensorFlow 2.5.0. Before you sta the exam, make sure your environment is ready: ❏ Make sure you have Python 3.8 installed on your computer. ❏ Check that your system meets the installation requirements for PyCharm here.
    [Show full text]
  • Quantum Reinforcement Learning During Human Decision-Making
    ARTICLES https://doi.org/10.1038/s41562-019-0804-2 Quantum reinforcement learning during human decision-making Ji-An Li 1,2, Daoyi Dong 3, Zhengde Wei1,4, Ying Liu5, Yu Pan6, Franco Nori 7,8 and Xiaochu Zhang 1,9,10,11* Classical reinforcement learning (CRL) has been widely applied in neuroscience and psychology; however, quantum reinforce- ment learning (QRL), which shows superior performance in computer simulations, has never been empirically tested on human decision-making. Moreover, all current successful quantum models for human cognition lack connections to neuroscience. Here we studied whether QRL can properly explain value-based decision-making. We compared 2 QRL and 12 CRL models by using behavioural and functional magnetic resonance imaging data from healthy and cigarette-smoking subjects performing the Iowa Gambling Task. In all groups, the QRL models performed well when compared with the best CRL models and further revealed the representation of quantum-like internal-state-related variables in the medial frontal gyrus in both healthy subjects and smok- ers, suggesting that value-based decision-making can be illustrated by QRL at both the behavioural and neural levels. riginating from early behavioural psychology, reinforce- explained well by quantum probability theory9–16. For example, one ment learning is now a widely used approach in the fields work showed the superiority of a quantum random walk model over Oof machine learning1 and decision psychology2. It typi- classical Markov random walk models for a modified random-dot cally formalizes how one agent (a computer or animal) should take motion direction discrimination task and revealed quantum-like actions in unknown probabilistic environments to maximize its aspects of perceptual decisions12.
    [Show full text]
  • Tensorflow and Keras Installation Steps for Deep Learning Applications in Rstudio Ricardo Dalagnol1 and Fabien Wagner 1. Install
    Version 1.0 – 03 July 2020 Tensorflow and Keras installation steps for Deep Learning applications in Rstudio Ricardo Dalagnol1 and Fabien Wagner 1. Install OSGEO (https://trac.osgeo.org/osgeo4w/) to have GDAL utilities and QGIS. Do not modify the installation path. 2. Install ImageMagick (https://imagemagick.org/index.php) for image visualization. 3. Install the latest Miniconda (https://docs.conda.io/en/latest/miniconda.html) 4. Install Visual Studio a. Install the 2019 or 2017 free version. The free version is called ‘Community’. b. During installation, select the box to include C++ development 5. Install the latest Nvidia driver for your GPU card from the official Nvidia site 6. Install CUDA NVIDIA a. I installed CUDA Toolkit 10.1 update2, to check if the installed or available NVIDIA driver is compatible with the CUDA version (check this in the cuDNN page https://docs.nvidia.com/deeplearning/sdk/cudnn-support- matrix/index.html) b. Downloaded from NVIDIA website, searched for CUDA NVIDIA download in google: https://developer.nvidia.com/cuda-10.1-download-archive-update2 (list of all archives here https://developer.nvidia.com/cuda-toolkit-archive) 7. Install cuDNN (deep neural network library) – I have installed cudnn-10.1- windows10-x64-v7.6.5.32 for CUDA 10.1, the last version https://docs.nvidia.com/deeplearning/sdk/cudnn-install/#install-windows a. To download you have to create/login an account with NVIDIA Developer b. Before installing, check the NVIDIA driver version if it is compatible c. To install on windows, follow the instructions at section 3.3 https://docs.nvidia.com/deeplearning/sdk/cudnn-install/#install-windows d.
    [Show full text]
  • Julia: a Modern Language for Modern ML
    Julia: A modern language for modern ML Dr. Viral Shah and Dr. Simon Byrne www.juliacomputing.com What we do: Modernize Technical Computing Today’s technical computing landscape: • Develop new learning algorithms • Run them in parallel on large datasets • Leverage accelerators like GPUs, Xeon Phis • Embed into intelligent products “Business as usual” will simply not do! General Micro-benchmarks: Julia performs almost as fast as C • 10X faster than Python • 100X faster than R & MATLAB Performance benchmark relative to C. A value of 1 means as fast as C. Lower values are better. A real application: Gillespie simulations in systems biology 745x faster than R • Gillespie simulations are used in the field of drug discovery. • Also used for simulations of epidemiological models to study disease propagation • Julia package (Gillespie.jl) is the state of the art in Gillespie simulations • https://github.com/openjournals/joss- papers/blob/master/joss.00042/10.21105.joss.00042.pdf Implementation Time per simulation (ms) R (GillespieSSA) 894.25 R (handcoded) 1087.94 Rcpp (handcoded) 1.31 Julia (Gillespie.jl) 3.99 Julia (Gillespie.jl, passing object) 1.78 Julia (handcoded) 1.2 Those who convert ideas to products fastest will win Computer Quants develop Scientists prepare algorithms The last 25 years for production (Python, R, SAS, DEPLOY (C++, C#, Java) Matlab) Quants and Computer Compress the Scientists DEPLOY innovation cycle collaborate on one platform - JULIA with Julia Julia offers competitive advantages to its users Julia is poised to become one of the Thank you for Julia. Yo u ' v e k i n d l ed leading tools deployed by developers serious excitement.
    [Show full text]
  • TRAINING NEURAL NETWORKS with TENSOR CORES Dusan Stosic, NVIDIA Agenda
    TRAINING NEURAL NETWORKS WITH TENSOR CORES Dusan Stosic, NVIDIA Agenda A100 Tensor Cores and Tensor Float 32 (TF32) Mixed Precision Tensor Cores : Recap and New Advances Accuracy and Performance Considerations 2 MOTIVATION – COST OF DL TRAINING GPT-3 Vision tasks: ImageNet classification • 2012: AlexNet trained on 2 GPUs for 5-6 days • 2017: ResNeXt-101 trained on 8 GPUs for over 10 days T5 • 2019: NoisyStudent trained with ~1k TPUs for 7 days Language tasks: LM modeling RoBERTa • 2018: BERT trained on 64 GPUs for 4 days • Early-2020: T5 trained on 256 GPUs • Mid-2020: GPT-3 BERT What’s being done to reduce costs • Hardware accelerators like GPU Tensor Cores • Lower computational complexity w/ reduced precision or network compression (aka sparsity) 3 BASICS OF FLOATING-POINT PRECISION Standard way to represent real numbers on a computer • Double precision (FP64), single precision (FP32), half precision (FP16/BF16) Cannot store numbers with infinite precision, trade-off between range and precision • Represent values at widely different magnitudes (range) o Different tensors (weights, activation, and gradients) when training a network • Provide same relative accuracy at all magnitudes (precision) o Network weight magnitudes are typically O(1) o Activations can have orders of magnitude larger values How floating-point numbers work • exponent: determines the range of values o scientific notation in binary (base of 2) • fraction (or mantissa): determines the relative precision between values mantissa o (2^mantissa) samples between powers of
    [Show full text]
  • Deepfakes & Disinformation
    DEEPFAKES & DISINFORMATION DEEPFAKES & DISINFORMATION Agnieszka M. Walorska ANALYSISANALYSE 2 DEEPFAKES & DISINFORMATION IMPRINT Publisher Friedrich Naumann Foundation for Freedom Karl-Marx-Straße 2 14482 Potsdam Germany /freiheit.org /FriedrichNaumannStiftungFreiheit /FNFreiheit Author Agnieszka M. Walorska Editors International Department Global Themes Unit Friedrich Naumann Foundation for Freedom Concept and layout TroNa GmbH Contact Phone: +49 (0)30 2201 2634 Fax: +49 (0)30 6908 8102 Email: [email protected] As of May 2020 Photo Credits Photomontages © Unsplash.de, © freepik.de, P. 30 © AdobeStock Screenshots P. 16 © https://youtu.be/mSaIrz8lM1U P. 18 © deepnude.to / Agnieszka M. Walorska P. 19 © thispersondoesnotexist.com P. 19 © linkedin.com P. 19 © talktotransformer.com P. 25 © gltr.io P. 26 © twitter.com All other photos © Friedrich Naumann Foundation for Freedom (Germany) P. 31 © Agnieszka M. Walorska Notes on using this publication This publication is an information service of the Friedrich Naumann Foundation for Freedom. The publication is available free of charge and not for sale. It may not be used by parties or election workers during the purpose of election campaigning (Bundestags-, regional and local elections and elections to the European Parliament). Licence Creative Commons (CC BY-NC-ND 4.0) https://creativecommons.org/licenses/by-nc-nd/4.0 DEEPFAKES & DISINFORMATION DEEPFAKES & DISINFORMATION 3 4 DEEPFAKES & DISINFORMATION CONTENTS Table of contents EXECUTIVE SUMMARY 6 GLOSSARY 8 1.0 STATE OF DEVELOPMENT ARTIFICIAL
    [Show full text]
  • Game Playing with Deep Q-Learning Using Openai Gym
    Game Playing with Deep Q-Learning using OpenAI Gym Robert Chuchro Deepak Gupta [email protected] [email protected] Abstract sociated with performing a particular action in a given state. This information is fundamental to any reinforcement learn- Historically, designing game players requires domain- ing problem. specific knowledge of the particular game to be integrated The input to our model will be a sequence of pixel im- into the model for the game playing program. This leads ages as arrays (Width x Height x 3) generated by a particular to a program that can only learn to play a single particu- OpenAI Gym environment. We then use a Deep Q-Network lar game successfully. In this project, we explore the use of to output a action from the action space of the game. The general AI techniques, namely reinforcement learning and output that the model will learn is an action from the envi- neural networks, in order to architect a model which can ronments action space in order to maximize future reward be trained on more than one game. Our goal is to progress from a given state. In this paper, we explore using a neural from a simple convolutional neural network with a couple network with multiple convolutional layers as our model. fully connected layers, to experimenting with more complex additions, such as deeper layers or recurrent neural net- 2. Related Work works. The best known success story of classical reinforcement learning is TD-gammon, a backgammon playing program 1. Introduction which learned entirely by reinforcement learning [6]. TD- gammon used a model-free reinforcement learning algo- Game playing has recently emerged as a popular play- rithm similar to Q-learning.
    [Show full text]
  • Reinforcement Learning with Tensorflow&Openai
    Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]> http://angelpawstherapy.org/positive-reinforcement-dog-training.html Nature of Learning • We learn from past experiences. - When an infant plays, waves its arms, or looks about, it has no explicit teacher - But it does have direct interaction to its environment. • Years of positive compliments as well as negative criticism have all helped shape who we are today. • Reinforcement learning: computational approach to learning from interaction. Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction Nishant Shukla , Machine Learning with TensorFlow Reinforcement Learning https://www.cs.utexas.edu/~eladlieb/RLRG.html Machine Learning, Tom Mitchell, 1997 Atari Breakout Game (2013, 2015) Atari Games Nature : Human-level control through deep reinforcement learning Human-level control through deep reinforcement learning, Nature http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Figure courtesy of Mnih et al. "Human-level control through deep reinforcement learning”, Nature 26 Feb. 2015 https://deepmind.com/blog/deep-reinforcement-learning/ https://deepmind.com/applied/deepmind-for-google/ Reinforcement Learning Applications • Robotics: torque at joints • Business operations - Inventory management: how much to purchase of inventory, spare parts - Resource allocation: e.g. in call center, who to service first • Finance: Investment decisions, portfolio design • E-commerce/media - What content to present to users (using click-through / visit time as reward) - What ads to present to users (avoiding ad fatigue) Audience • Want to understand basic reinforcement learning (RL) • No/weak math/computer science background - Q = r + Q • Want to use RL as black-box with basic understanding • Want to use TensorFlow and Python (optional labs) Schedule 1.
    [Show full text]
  • The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy-Based Policy Naoto Yoshida
    The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy-based Policy Naoto Yoshida To cite this version: Naoto Yoshida. The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy- based Policy. 2015. hal-01138709v2 HAL Id: hal-01138709 https://hal.archives-ouvertes.fr/hal-01138709v2 Preprint submitted on 19 Oct 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy-based Policy Naoto Yoshida School of Biomedical Engineering Tohoku University Sendai, Aramaki Aza Aoba 6-6-01 Email: [email protected] Abstract—Learning and acting in a high dimensional state- when the actions are represented by binary vector actions action space are one of the central problems in the reinforcement [4][5][6][7][8]. Energy-based RL algorithms represent a policy learning (RL) communities. The recent development of model- based on energy-based models [9]. In this approach the prob- free reinforcement learning algorithms based on an energy- based model have been shown to be effective in such domains. ability of the state-action pair is represented by the Boltzmann However, since the previous algorithms used neural networks distribution and the energy function, and the policy is a with stochastic hidden units, these algorithms required Monte conditional probability given the current state of the agent.
    [Show full text]